![]() ![]() |
![]() |
UFDC Home | Search all Groups | UF Institutional Repository | UF Institutional Repository | | Help |
Material Information
Notes
Record Information
|
Full Text |
de Crecy-Lagard et al. Biology Direct 2012, 7:32 http://www.biology-direct.com/content/7/1/32 BIOLOGY DIRECT Comparative genomic analysis of the DUF71/ COG2102 family predicts roles in diphthamide biosynthesis and B12 salvage Val6rie de Cr6cy-Lagard", Farhad Forouhar 2, C61ine Brochier-Armanet3, Liang Tong2 and John F Hunt2 Abstract Background: The availability of over 3000 published genome sequences has enabled the use of comparative genomic approaches to drive the biological function discovery process. Classically, one used to link gene with function by genetic or biochemical approaches, a lengthy process that often took years. Phylogenetic distribution profiles, physical clustering, gene fusion, co-expression profiles, structural information and other genomic or post-genomic derived associations can be now used to make very strong functional hypotheses. Here, we illustrate this shift with the analysis of the DUF71/COG2102 family, a subgroup of the PP-loop ATPase family. Results: The DUF71 family contains at least two subfamilies, one of which was predicted to be the missing diphthine-ammonia ligase (EC 6.3.1.14), Dph6. This enzyme catalyzes the last ATP-dependent step in the synthesis of diphthamide, a complex modification of Elongation Factor 2 that can be ADP-ribosylated by bacterial toxins. Dph6 orthologs are found in nearly all sequenced Archaea and Eucarya, as expected from the distribution of the diphthamide modification. The DUF71 family appears to have originated in the Archaea/Eucarya ancestor and to have been subsequently horizontally transferred to Bacteria. Bacterial DUF71 members likely acquired a different function because the diphthamide modification is absent in this Domain of Life. In-depth investigations suggest that some archaeal and bacterial DUF71 proteins participate in B12 salvage. Conclusions: This detailed analysis of the DUF71 family members provides an example of the power of integrated data-miming for solving important "missing genes" or "missing function" cases and illustrates the danger of functional annotation of protein families by homology alone. Reviewers' names: This article was reviewed by Arcady Mushegian, Michael Galperin and L. Aravind. Keywords: Diphthamide, Vitamin B12, Amidotransferase, Comparative genomics Background In both Archaea and Eucarya, the translation Elongation Factor 2 (EF-2) harbors a complex post-translational modification of a strictly conserved histidine (His699 in yeast) called diphthamide [1]. This modification is the target of the diphtheria toxin and the Pseudomonas exo- toxin A, which inactivate EF-2 by ADP-ribosylation of the diphthamide [2,3]. Although the diphthamide bio- synthesis pathway was described in the early 1980's [2,3], the corresponding enzymes have only recently SCorrespondence vcrecy@ufl edu Department of Microbiology and Cell Science, University of Florida, Gainesville, FL 32611, USA Full list of author information is available at the end of the article been characterized. In vitro reconstitution experiments have shown that the first step, the transfer of a 3-amino- 3-carboxypropyl (ACP) group from S-adenosylmethio- nine (SAM) to the C-2 position of the imidazole ring of the target histidine residue, is catalyzed in Archaea by the iron-sulfur-cluster enzyme, Dph2 [4,5] (Figure 1A). Genetic and complementation studies have shown that the catalysis of the same first step requires four proteins (Dphl-Dph4) in yeast and other eukaryotes [6-9]. The subsequent step, trimethylation of an amino group to form the diphthine intermediate, is catalyzed by diphthine synthase, Dph5 (EC 2.1.1.98) (Figure 1A) [10,11]. The last step, the ATP-dependent amidation of the carboxylate group [12], is catalyzed by diphthine- 0 2012 de Crecy-Lagard et al., licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Biole led Central Creative Commons Attribution License (http//creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. de Crecy-Lagard et al. Biology Direct 2012, 7:32 http://www.biology-direct.com/content/7/1/32 NH2 I ('(A /S. N CO2 OH OH M Dph2 (+Dphl, Dph3, Dph4) 13N CO; EC 2.1.1.98 NH SAM NH Dph5 H3C - H3C CO2 Diphthine EC 6.3.1.14 ATP, NH3 NNH ------ / NH Dph6? H3C, + (+Dph7) H3C-'N H3C NH2 0 Diphthamide NH H3C + ATP H3C CO2- Diphthine Dph6? NH ------ > H3c, + NH3 H3C-NN H3C AMP 0 C H2N 0 S| O N H 2 Hd R NH2 S/Adenine R=deoxyadenosyl Ado-PseudoB12 Figure 1 Structures of diphthamide and B12 precursors and derivatives. (A) The core diphthamide pathway is predicted to contain three enzymes Dph2, Dph5 and Dph6 in Archaea. The formation of diphthine has been reconstituted in vitro using Dph2 and Dph5 from Pyrococcus horikoshii [4,5]. The enzyme family catalyzing the last step in Archaea and Eukarya Dph6 was missing. In yeast, the first and last steps require additional proteins (Dphi, Dph3 and Dph7). (B) Predicted Dph6-catalyzed reactions. (C) Ado-Pseudo-B12 structure and hydrolysis site by the bacterial CbiZ enzyme (bCbiZ). Parts (A) and (B) are adapted with permission from Xuling Zhu; Jungwoo Kim; Xiaoyang Su; Hening Lin; Biochemistry 2010, 49, 9649-9657. Copyright 2010 American Chemical Society. HZ ---- NH2- Biochemistry 2010, 49, 9649-9657 Copyrigt 2010 Amerian Chemical Socety. ammonia ligase (EC 6.3.1.14), but the corresponding gene has not been identified (http://www.orenza.u-psud. fr/). A protein involved in this last step was recently identified in yeast (YBR246W or Dph7), but it is most certainly not directly involved in catalysis as it is not conserved in Archaea and it contains a WD-domain likely to be involved in protein/protein interactions [13]. Using a combination of comparative genomic approaches, we set out to identify a candidate gene for this orphan enzyme family. Based on taxonomic distribution, domain organization of gene fusions, phys- ical clustering on chromosomes, atomic structural data, co-expression, and phenotype data, a promising candi- date was identified, the family called Domain of Un- known Function family DUF71(IPR002761) in Interpro [14]. This family is also called ATP_bind_4 (PF01902) in Pfam [15]or Predicted ATPases of PP-loop superfamily (COG2102) in the Cluster of Ortholous Group database [16]. However, detailed analysis of the DUF71 family revealed that this family is almost surely not A N \ -NH Page 2 of 13 Dph6? - ----------....._ NH H3C-N H3C NH2 0 Diphthamide de Crecy-Lagard et al. Biology Direct 2012, 7:32 http://www.biology-direct.com/content/7/1/32 isofunctional. Some Archaea contain two very divergent copies of the gene, while homologs are found in Bac- teria, which are known to lack diphthamide. This obser- vation suggests that some DUF71 members have different functions and probably participate in different biochemical pathways. Methods Comparative genomics The BLAST tools [17] and resources at NCBI (http:// www.ncbi.nlm.nih.gov/) were routinely used. Multiple se- quence alignments were built using ClustalW [18] or Mul- tialin [19]. Protein domain analysis was performed using the Pfam database tools (http://pfam.janelia.org/) [15]. Analysis of the phylogenetic distribution and physical clustering was performed in the SEED database [20]. Results are available in the "Diphthamide biosynthesis" and "DUF71-B12" subsystem on the public SEED server (http://pubseed.theseed.org/SubsysEditor.cgi). Phylogen- etic profile searches were performed on the IMG platform [21] using the phylogenetic query tool (http://img.jgi.doe. gov/cgi-bin/w/main.cgi?section=PhylogenProfiler&page=- phyloProfileForm). Physical clustering was analyzed with the SEED subsystem coloring tool or the Seedviewer Compare region tool [20] as well as on the MicrobesOn- line (http://www.microbesonline.org/) tree based genome browser [22]. The SPELL microarray analysis resource [23] was used through the Saccharomyces Genome Data- base (SGD) (http://www.yeastgenome.org/)[24] to analyze yeast gene coexpression profiles. Clustering of yeast dele- tion mutants based on phenotype analysis was analyzed through the yeast fitness database available at http://fitdb. stanford.edu/ [25,26]. Mapping of gene distribution profile to taxonomic trees were generated using the iTOL suite (http://itol.embl.de/index.shtml) [27]. Sequence logos were derived using the WebLogo platform [28]. Structure analysis Visualization and comparison of protein structures and manual docking of ligand molecules were performed using PyMol (The PyMOL Molecular Graphics System, Version 1.4.1, Schr6dinger, LLC). XtalView [7] was used for the protein docking exercises. Phylogenetic analyses The survey of the 1996 complete prokaryotic genomes available at the NCBI (http://www.ncbi.nlm.nih.gov/) using BLASTP [17] (default parameters) allowed identifi- cation of 119 bacterial and 144 archaeal DUF71 homo- logs in addition to the 182 eukaryotes homologs identified in the RefSeq database at the NCBI [29] (Additional file 1: Table Sl). The retrieved sequences were aligned using MAFFT [8] and the resulting alignment was visually inspected using ED, the alignment editor of the MUST package [30]. The phylogenetic analysis of the 445 sequence was performed using the neighbor-joining distance method implemented in SeaView [31]. The robustness of the resulting tree was assessed by the non- parametric bootstrap method (100 replicates of the original dataset) implemented in SeaView. A second phylo- genetic analysis restricted to 50 archaeal and eukaryotic homologs representative of the genetic and genomic diver- sity of these two Domains was performed using the Bayesian approach implemented in Phylobayes [6] with a LG model. Results and discussion Comparative genomics points to DUF71/COG2102 as a strong candidate for the missing diphthamide synthase family The distribution of known diphthamide biosynthesis genes in Archaea was analyzed using the SEED database and its tools [20]. The 59 archaeal genomes analyzed all contained an EF-2 encoding gene. Analysis of the distri- bution of Dph2 and Dph5 in Archaea showed that 58/59 genomes encoded these two proteins. The only archaeon lacking both Dph2 and Dph5 was Korarchaeum cryptofi- lum OPF8 (Figure 2A). We therefore hypothesized that this organism has lost the diphthamide modification pathway even if the K. cryptofilum EF-2 still harbors the conserved His residue at the site of the modification (His6o3 in the K. cryptofilum sequence, Accession B1L7Q0 in UniprotKB). Using the IMG/JGI phylogenetic query tools [21], we searched for protein families found in all Archaea except Korarchaeum cryptofilum OPF8, present in Saccharomyces cerevisiae and Homo sapiens but absent in Escherichia coli and Bacillus subtilis, as bacteria are known to lack this modification pathway. Only one family, DUF71/COG2102, followed this taxo- nomic distribution. This family had been described pre- viously as a PP-loop ATPase of unknown function containing a Rossmannoid class HUP domain [32]. Using the neighborhood analysis tool of the SEED database [20], physical clustering was generally not observed between the dph2, dph5 and DUF71 genes ex- cept in three Methanosarcina genomes where the dph5 is located in the vicinity of DUF71 genes (Figure 2B). If members of the DUF71 catalyze the last step of diphtha- mide synthesis they should bind ATP [12]. Structural analysis of the DUF71 protein from Pyrococcus furious (PF0828) reveals the presence of two distinct domains: an N-terminal HUP domain that contains a highly con- served PP-motif that interacts with ATP (PDB id: 3RK1) and AMP (PDB id: 3RKO), and a C-terminal 100-residue domain belonging to a novel fold with a highly conserved motif GEGGEF/YE188T/S (P. furious numbering) that is probably involved in substrate binding and recognition [33]. Page 3 of 13 de Crecy-Lagard et al. Biology Direct 2012, 7:32 http://www.biology-direct.com/content/7/1/32 Dph2e Dph50 -. -ooo Met Duf71 I I. I :o : IF"F, | t .it ----------- -* O 0 EF2 .0. 0 S ., -------- 0 M e r,,,,,,,,,,,- -- ----- e -----0 S -----------** * 1 B--"r*" i-i, t ni n ---------- go S. ,,M A---------- 0 ---C7 ---- --- * -------------- * --------- e 0 S---------00 0 S------ 0 0 *.0 0 ^^^ *^^ ^^^ Piooocui h nliriia T3 ------------ ^ I0 0 -------------e * Fiue2 o prti gnm ia lyasisof 7 f--------- 0 S 1 -----------00 0 -------------------------------------0 0 * FigreC ----------------lysis ot--- ------------- 6 I~~~---------------------------~maro---------00 6 Figure 2 Comparative genomic analysis of the DUF71 family. (A) Distri hanosarcina mazei Goel 1 DUF711 Dph5 COG1849 COG1885 RbsK COG210 hanosarcina acetivorans C2A hanosarcina barker str. fusaro Methanohalophilus mahii YP_003542089.1 Methanosalsum zhilinae YP 004615790 Candidates Nanosalina sp. J07AB43 EGQ42883.1 3at-II _Asn_Synthase_B_ Duf7 (PF01902) cd00352) C (cd01991) AsnB (COG0367) S. cerevisiae YLR 143W Arabidopsis thaliana AT3g04480 the core diphthamide genes Dph2 and Dph5 and and DUF71 in Archaea, according to data derived from the "Diphthamide biosynthesis" subsystem in the SEED database. The tree is a species tree istructed in iTol (itol.embl.de/). The presence and absence of th (B) Physical clustering www.microbesonline., CDD, or Pfam domain f DUF71/COG2102 g/). (C) Examples o lumbers are aiven i genes with Dph proteins contain i parentheses. e specific genes was derived from the "Diphthamide biosynthesis" subsystem. ree Methanosarcina genomes derived from the MicrobesOnline database (http:/ mains fused to DUF71 in Archaea and Eucarya. Accession numbers and COG, Coexpression, phenotype and structural data link the yeast DUF71 to translation and diphthamide biosynthesis YLR143w is the only S. cerevisiae DUF71 family mem- ber. Using YLR143w as input in the SPELL co- expression query tool [23] showed that nearly all co-expressed genes were involved in translation and ribosome biogenesis (Additional file 2: Table S2). This observation suggested that the DUF71 protein family has a role in translation as expected for a protein modi- fying EF-2. Like all known diphthamide synthesis genes, YLR143w is also not essential. More specifically, deletion of any of the five known diphthamide genes confers sor- darin resistance in yeast [34,35] and ylrl43wA strain was shown to be as resistant to this compound as the diphthamide deficient strains (see supplemental data in [34]). Furthermore, in a recent complete analysis of rela- tionships between gene fitness profiles (co-fitness) and drug inhibition profiles (co-inhibition) from several hun- dred chemogenomic screens in yeast [25,26] available at http://fitdb.stanford.edu/, it was found that among the top ten interactors with YLR143w by homozygous co- sensitivity are DPH5, DPH2, DPH4 (or JJJ3) and the newly identified DPH7 (or YBR246w) (Additional file 3: Figure Si). Both the coexpression and phenotype data thereby strongly support the hypothesis that YLR143w catalyzes the missing last step of diphthamide biosyn- thesis, even if one cannot rule out at this stage that other catalytic subunits yet to be identified may also be required. Finally, comparison of ATP- and AMP-containing structures of PF0828 reveals that the active site of the former has a narrow groove at the end of which only the a-phosphate of ATP is exposed to the solvent whereas the active site of the latter is wide open (Figure 3A and B). Also, there is a sharp turn at the a-phosphate of ATP, suggesting that it is the site of the nucleophilic at- tack. We therefore performed a docking exercise using the EF-2 structure (PDB id: 3B82) [36] with the ATP- Page 4 of 13 Duf71 (PF01902) Ri AA Ri( A 4 I I (PF01042) (PF01042) de Crecy-Lagard et al. Biology Direct 2012, 7:32 http://www.biology-direct.com/content/7/1/32 A PF0828 /P IO PF0828 ;, Di th ne..' o- - EF2 " -6 ;*- ; S ^?- ^ ^"f - PF0828 6,- E 1 8 8 -. ( E-SS \. .- - containing structure of PF0828. The docking revealed that the active site groove of the ATP-containing struc- ture can easily accommodate diphthine with a few minor clashes between the two structures (Figure 3A and B). The modeling also showed that the carboxyl group of diphthine resides near the a-phosphate of ATP and carboxylate group of residue Glu188, suggesting that nucleophilic attack by diphthine on the a-phosphate Page 5 of 13 Figure 3 Structural analysis of the DUF71 (PF0828) putative activesite. (A) Docking of modified EF-2 cyann, PDB id: 3B82) onto ATP-bound structure of PF0828 (yellow, PDB id: 3RK1). ATP and several residues of PF0828 (DUF71), which are conserved among archaeal and eukaryotic orthologs, and diphthine of EF-2 (see text for details) are shown in stick models. (B) Close-up stereo pair of panel A. Diphthine of EF-2 and the side chains of conserved residues of PF0828, at the interface of PF0828 and EF-2, are shown in stick models and labeled. (C) Stereo pair view of ATP-binding region of PF0828. Residues that are conserved among Dph6 and DUF71-B12 families are depicted in stick models with carbon atoms in cyan, while the residues that are specific to Dph6 family are shown in stick models with carbon atoms in green. Oxygen and nitrogen atoms are shown in red and blue in all stick models, respectively. of ATP is highly feasible (Figure 3B). As shown in Figure 3B, the modelling also shows that several resi- dues which are highly conserved among archaeal and eukaryotic PF0828 and YLR143w orthologs beside E188, including S44, Y45, E78, Y103, Q104, A149, E183 and E186 (Additional file 3: Figure S2), are at the interface of the modelled complex of PF0828 with EF-2, sup- porting the hypothesis that they play important roles in EF-2 recognition (Figure 3B). Linking DUF71 family members to ammonia transfer reactions The diphthine ammonia lyase reaction requires a source of NH3 [12]. Domain fusions involving members of the DUF71 family in the Pfam database [15] suggests the source of NH3 might vary depending on the organism. For example, in a few Archaea (e.g. Methanohalophilus mahii DSM 5219, Methanosalsum zhilinae DSM 4017 or 'Candidatus Nanosalinarum sp. J07AB56'), a COG0367/ AsnB asparagine synthetase [glutamine-hydrolyzing] (EC 6.3.5.4) domain is found at the N-terminus of the DUF71 domain (Figure 2C). This AsnB domain can be further separated into two subdomains, an N-terminal class-II glutamine amidotransferase domain (GAT-II) [37] and an Asn_SynthaseBC PP-loop ATPase domain (Figure 2C) . This domain organization suggests that in this subset of enzymes, the hydrolysis of glutamine catalyzed by the GAT-II domain could provide the NH3 moiety to both the DUF71 and the Asn_SynthaseBC enzymes. On the other hand, in many eukaryotes such as yeast and Arabi- dopsis thaliana, two YjgF-YER057c-UK114-like domains are fused to the C-terminus of the DUF71 protein as pre- viously noted by Aravind et al. [32] (Figure 2C). The stand-alone members of the YjgF-YER057c-UK114 family, now called the RidA family (for reactive intermediate/ imine deaminase A), have been shown to deaminate pro- ducts generated by PLP-dependent enzymes, which results in the release of NH3 [38]. The RidA domains fused to de Crecy-Lagard et al. Biology Direct 2012, 7:32 http://www.biology-direct.com/content/7/1/32 DUF71 could therefore be involved in providing the NH3 ammonium moiety for diphthamide synthesis. The Duf71 family is not monofunctional The taxonomic distribution of DUF71 homologs in avail- able complete genomes confirmed that DUF71 is present in one or occasionally two copies in all Archaea except the korarchaeon K. cryptofilum (Table 1 and Additional file 1: Table Sl). This pattern is consistent with an ancient origin of the DUF71 gene in Archaea. In sharp contrast, DUF71 is sporadically distributed in Bacteria, being present only in a few representatives of some phyla (Table 1 and Additional file 1: Table Sl). This pattern fits either with an ancient origin of DUF71 in Bacteria followed by numerous losses or, conversely, with a more recent acquisition fol- lowed by horizontal gene transfer (HGT) among bacterial lineages. To further investigate the evolutionary history of DUF71, we made a phylogenetic analysis of the homologs identified in the three Domains of Life. The resulting tree showed two divergent groups of sequences. The first group contains the eukaryotic and nearly all archaeal sequences (including the predicted yeast DPH6 (YLR143w) and P. furious PF0828), whereas the second encom- passes all the bacterial sequences as well as the second copy found in a few archaeal genomes (Figure 4 and Additional file 3: Figure S3). This second group emerged from within the archaeal sequences of the first cluster and showed various contradictions with the currently recognized taxonomy because bacterial sequences from distantly related lineages appeared intermixed in the tree (Figure 4). These observations together with the extremely patchy distribution of DUF71 in bacteria strongly supports the hypothesis that the bacterial DUF71 was of archaeal origin and spread through this domain mainly by HGT. Interestingly, the second homologs present in a few archaeal genomes emerged from bacterial sequences, suggesting that secondary HGT occurred from Bacteria to Archaea allowing them acquiring a second DUF71 homolog. In contrast, a phylogenetic analysis focused on archaeal and eukaryotic sequences strongly supported the separation between these two Domains (posterior probabilities (PP) = 1). Moreover it recovered the mono- phyly of most eukaryotic and archaeal major lineages (most PP > 0.95, Additional file 3: Figure S3), suggesting that DUF71 was present in their ancestors. However, as expected given the small number of amino acid positions analyzed (182 positions), the relationships among these lineages were mainly unresolved (most PP < 0.95) pre- cluding the in-depth analysis of the ancient evolutionary history of DUF71 in Archaea and Eucarya (Additional file 3: Figure S3). Nevertheless, the wide distribution of DUF71 in these two Domains (even in highly derived parasites such as Microsporidia, Cryptosporidium, Entamoeba or Nanoarchaeum equitans, not shown) and its ancestral presence in most of their orders/phyla sug- gested that this gene was present in the last common an- cestor of these two Domains. This inference does not imply, however, that no HGT occurred in these Domains. Indeed, some incongruence between the DUF71 phylogeny and the reference phylogeny of organ- isms [39] suggested putative cases of HGT. For instance, it was observed for the Thermofilum pendens DUF71 Table 1 Taxonomic distribution of DUF71 homologs in archaeal and bacterial genomes Phylum Nb (%) genomes Phylum Nb (%) genomes Phylum Archaea Nb (%) genomes Crenarchaeota Euryarchaeota Bacteria Acidobacteria Actinobacteria Aquificae Bacteroidetes Caldiserica Chlorobi Chloroflexi Chrysiogenetes Cyanobacteria Deferribacteres 37/37 (100%) 79/79 (100%) 3/7 (42.9%) 1/206 (0.5%) 0/10 (0%) 20/73 (27.4%) 0/1 (0%) 0/11 (0%) 5/16 (31.3%) 0/1 (0%) 0/45 (0%) 0/4 (0%) Deinococcus-Thermus 2/17 (11.8%) rarchaeota Dictyoglomi Elusimicrobia Fibrobacteres Firmicutes Fusobacteria Gemmatimonadetes Ignavibacteria Nitrospirae 0/1 (0%) 0/2 (0%) 0/2 (0%) 0/2 (0%) 20/484 (4.1%) 0/5 (0%) 0/1 (0%) 0/1 (0%) 1/3 (33.3%) ProteobacteriaAlpha 2/204 (1%) Proteobacteria Beta 8/119 (6.7%) ProteobacteriaDelta 1/48 (2.1%) Thaumarchaeota ProteobacteriaEpsil 2/2 (100%) 0/64 (0%) Proteobacteria Gamma 27/406 (6.7%) PVCChlamydiae 1/73 (1 4%) PVCPlanctomycetes PVC Verrucomicrobia Spirochaetes Synergistetes Thermodesu Ifobacteria Thermotoqae 3/6 (50%) 0/4 (0%) 1/45 (2.2%) 0/4 (0%) 0/2 (0%) 5/14 (35.7%) The number of genomes per phylum containing at least one homolog of DUF71 is indicated. Page 6 of 13 de Crecy-Lagard et al. Biology Direct 2012, 7:32 http://www.biology-direct.com/content/7/1/32 Cluster 1/Dph6 Page 7 of 13 Eucarya Archaea Actinobacteria Acidobacteria Bacteroidetes Chloroflexi Firmicutes Proteobacteria_Alpha Proteobacteria Beta Proteobacteria Delta Proteobacter a Gamma PVC_Planctomycetes PVC_Chlamydiae Spirochaetes Thermotogae Cluster 2/Duf71-B12 Figure 4 (See legend on next page) 030536 1tX~ Cluster 3 YP 004754874 ' 003642272 de Cr6cy-Lagard et al. Biology Direct 2012, 7:32 http://www.biology-direct.com/content/7/1/32 Page 8 of 13 (See figure on previous page.) Figure 4 Neighbor-joining phylogenetic tree of the 445 DUF71 homologs identified in public databases. The scale bar represents the average number of substitutions per site. Numbers at nodes are bootstrap values. For clarity only values greater than 50% are indicated. Colors correspond to the taxonomic affiliation of sequences (see the box on the figure for details). The full tree of Cluster 1 is shown in Additional file 3: Figure S3). that robustly groups with Methanomicrobia (Euryarch- aeota) and not with other Thermoproteales (Additional file 3: Figure S3). Because diphthamide is a modification specific to the archaeal and eukaryotic EF-2 proteins and bacteria lack all known diphthamide biosynthesis genes, we propose that cluster 1 in our phylogeny corresponds to bonafide Dph6 enzymes involved in diphthamide synthesis (Figure 4). This function therefore very likely represents the ancestral function of the whole DUF71 family. In contrast, bacteria do not synthesize diphthamide, suggesting that the bacter- ial DUF71 homologs and the few additional archaeal cop- ies (cluster 2, Figure 4) are involved in another function, and thus a functional shift occurred after the HGT of an archaeal bona fide Dph6 to bacteria. Notably, these genes (including PF0295, the second DUF71 copy found in P. furious) are strongly clustered on the chromosome with vitamin B12 salvage genes. More precisely 75/102 are adjacent to vitamin B12 trans- porter genes (such as the BtuCDF genes) [40] and 18/102 are adjacent to cbiB genes encoding adenosylcobinamide-phosphate synthetase, an en- zyme shared by the de novo and salvage pathways [41] (Figure 5A). This clustering data can be visua- lized in the "Duf71-B12" subsystem in the SEED database, and two typical clusters are shown in ~EuDo )pAdo-Pseudo-B12 -- cx-AMP-AP SCobA De novo corrin ring AP-P <- Thr-P CobUTSC (Bacteria) DMB CobYZ (Archaea) " 1>0 (E^do) Pyrobaculum calidifontis cobY cbiB cobD cobT cobA DUF71-B12 cobS aCbiZ/CobZ Clostridium perfringens cobT cobUcobS cobCcbiBcobD btuF btuC btuD CbiP DUF71-B12 Figure 5 Links between the DUF71 family and B12 salvage. (A) Summary of cobinamide derivative salvage in Bacteria and Archaea; arrows with dotted lines denote multiple steps. (B) Typical examples of physical clustering of DUF71-B12 genes with B12 salvage genes in Archaea and Bacteria. Abbreviations: Pseudo-B12, adenosylpseudocobalamin; Cbi, Cobinamide; AdoCbi, adenosylCbi; AdoCbi-P, AdenosylCbi-phosphate; AdoCby, adenosylcobyric acid; AP; (R)-1-amino-2-propanol; AP-P, AP-phosphate; Thr-P, L-threonine-phosphate; DMB, 5,6-dimethylbenzimidazole; a- AMP-AP, a-adenylate-AP; CobU, ATP:AdoCbi kinase, GTP:AdoCbi-GDP guanylyltransferase; CobY, NTP:AdoCbi-P nucleotidyltransferase; CobA, ATP:co (I)rrinoid adenosyltransferase; aCbiZ, adenosylcobinamide amidohydrolase; bCbiZ, pseudo-B12 amidohydrolase; CbiB, cobyric acid synthetase; CobD, L-threonine phosphate decarboxylase; CobS, cobalamin (5-P) synthase; CobT, 5,6-dimethylbenzimidazole phosphoribosyltransferase; CobC or CobZ, alpha-ribazole-5'-phosphate phosphatase; cobY, adenosylcobinamide-phosphate guanylyltransferase; CbiP, cobyric acid synthase; BtuFCD, cobamide transporter subunits. de Crecy-Lagard et al. Biology Direct 2012, 7:32 http://www.biology-direct.com/content/7/1/32 Figure 5B. On this basis, we hypothesize that the archaeal and bacterial DUF71 genes that cluster with B12 vitamin genes have a role in B12 metabolism. Finally, some bacterial DUF71 proteins might also have other functions because a set of bacteria such as Clostrid- ium perfringens have two or more DUF71 homologs (Figure 4 and Additional file 1: Table Sl). The most ex- treme example is Dehalococcoides sp. CBDB1, which encodes five DUF71 homologs in its genome. In the case of C. perfringens ATCC 13124 and SM101, one homolog (YP_695745 and YP_698440) clusters both physically and phylogenetically (Figure 4 and 5A) with the B12 subgroup proteins, whereas the second homolog (YP_695178 and YP_698039) is related to Acinetobacter baumanii (Cluster 3, Figure 4) and is not found associated to gene clusters related to B12 salvage (data not shown). Therefore, based on phylogenetic and physical cluster- ing the DUF71 proteins were split into: the Dph6 and the DUF71-B12 subgroups that were annotated as such and captured in the "Diphthamide biosynthesis" and "Duf71-B12" subsystems in the SEED database. Predicting the function of members of the DUF71-B12 subgroup As members of the DUF71-B12 subgroup clustered strongly with B12 transport genes and with cbiB (Figure 5B), we focused on the early steps on B12 salvage, which are quite diverse because several forms of coba- mides [cobalamin-like or Cbl-like compounds] can be sal- vaged (Figure 5A). Cobinamide (Cbi) is adenylated after transport to form adenosylcobinamide (AdoCbi). In most bacteria, AdoCbi is directly phosphorylated by CobU be- fore being transformed after several steps into adenosylco- balamin (AdoCbl or coenzyme B12), in which the lower ligand is 5,6-dimethylbenzimidazole (DMB) (see [42] for review) (Figure 5A). Archaea use a different salvage route in which AdoCbi is converted to adenosylcobyric acid (AdoCby), an intermediate of the de novo pathway, by an amidohydrolase, aCbiZ [43] (Figure 5A). AdoCby is then converted directly to adenosylcobinamide-phosphate (AdoCbi-P) by CbiB. Finally some bacteria have CbiZ homologs (bCbiZ) that hydrolyze adenosylpseudocobala- min (Ado-Pseudo-B12) [44], which contains an adenine instead of DMB as its lower ligand (Figure IC and 5A). In order to gain insight into the possible function of DUF71-B12 family members, we analyzed the co- distribution pattern of CbiZ, CbiB and DUF71-B12 pro- teins in Archaea and Bacteria. Interestingly, to a few exceptions, all prokaryotic genomes encoding CbiB har- bor either CbiZ or DUF71-B12 (Figure 6). However, in bacteria, there was strict anti-correlation between the DUF71-B12 and the CbiZ families (Figure 6A). This was not the case in Archaea where quite a few organisms (such as P. furiosis or Methanosarcina mazei Gol) harbored both families (Figure 6B). This distribution profile suggests that members of the DUF71-B12 sub- family fulfil the same roles as the bacterial CbiZ enzymes (bCbiZ), either by catalysing the same reaction (cleaving Ado-pseudo-B12 into AdoCby) or by providing another route to salvaging Pseudo-B12. This hypothesis would explain why bacteria would have one or the other while Archaea could carry both (Figure 6B), because archaeal CbiZ proteins have been predicted to lack pseudo-B12 cleavage activity [44]. Detailed analysis of the signature motifs of the two subfamilies reveal that the strictly conserved EGGE/ DXE188 motif (P. furious PF0828 numbering) in Dph6 proteins is replaced by a ENGEF/YH188 motif in the DUF71-B12 proteins (Additional file 3: Figure S2 and Additional file 3: Figure S4). In the Dph6 family, E188 is located near the predicted diphthine binding site and is predicted to be involved in cataly- sis (Figure 3B). The replacement of the strictly con- served E188 residue by a Histidine residue strongly suggest a change in the reaction catalyzed by the DUF71-B12 subfamily compared to the Dph6 family. The structure based comparison between the two subfamilies also strongly supports the hypothesis that their substrates are different, because all residues pre- dicted to be involved in EF-2 binding (Figure 3B see section above) are different in the DUF71-B12 sub- family but mostly conserved within this subfamily (Additional file 3: Figure S2 and residues in green in Figure 3C). Residues that are conserved between the two DUF71 subfamilies (Additional file 3: Figure S2 and residues in blue in Figure 3C) are found around the phosphate groups of ATP, including S12, G13, G14, K15, D16, H48, and T189 (PF0828 sequence numbering) or belong to the C-terminal conserved sequence motif (EGGE/D-X-E188) such as G182, G184, G185, E186, F187 (Additional file 3: Figure S2 and Figure 3C). Further experimental studies will be required to determine whether DUF71-B12 proteins are Ado-pseudo-B12 amidohydrolases or have another role in Ado-pseudo- B12 salvage. Conclusions Our detailed analyses of the DUF71 family members pre- sented here provide an example of the power of compara- tive genomic approaches for solving important "missing genes" or "missing function" cases. These analyses simul- taneously illustrate the difficulties inherent in accurately annotating gene families. On one hand, the evidence iden- tifying a candidate for the missing Dph6 gene family derived from genomic evidence (mainly phylogenetic dis- tribution and gene fusions) and post-genomic evidence (structure, co-expression analysis and genome-wide phenotype experiments) is so strong that it could be used Page 9 of 13 de Crecy-Lagard et aI. Biology Direct 2012, 7:32 http://www.biology-direct.com/content/7/1/32 as an example where the functional annotation of a pro- tein of unknown function could be derived from compara- tive genomic alone. On the other hand our analyses show that a subgroup of the DUF71 family is most certainly involved in a metabolic pathway unrelated to diphthamide synthesis and that transferring functional annotations from homology scores alone would be inappropriate in this case. We believe that this integrated functional anno- tation approach will play an important role in future pipe- lines for annotation of protein families. Additional files Additional file 1: Table 51. Genbank RefSeq identities and corresponding organisms for all proteins used in the phylogenies Additional file 2: Table S2. GO Term Enrichment Spell analysis (http// imperioprinceton edu'3000/yeast) with YLR143w as input Page 10 of 13 -400e A -... -----0 B I ---. o0 SI ........... ------------- 0 ------- oo ------------ *0 .,,--- ---------- -0 -.-.... .................... -.-..-. .P..-........ m T o ------ - ,-.- --f ---------a edlM534s ---0 So o-stodasir --------- 0- -- -Mo--sd-atncusP2 .....0 0-------- ---- ----- -0 0 i a angense SC -16 ----- 0 0 ----- obacuJumaeopilmstrIM2 ------ i .... .ebaculu0nisncumOSM414 ---*O 3_,-----0 I lobalum cadifoots JCM 11548 ---4 P2---A -------0 S ------0 haega sdta DSM ---0 * 0 i .......ctersmi ATCC 35061 --Q 0 S------- o akarmanus --------Q * S --~V ., ^- -,. .......... -000 0-- o a o Ss ---- S------------- o Thnemococcus kodaarensisK 1 ---K * S-------oThem onms NAl ----- Sf-----400 -oeoohOiSOT3 --------* S-- ----------- --P o susOS 3;; -----* -E ---'00S ococ-sabysG--E5 ----- --------- boiquenseDSM 11551 0 ,,,,,,,,,3 .. ArC 5* ---.- o ulmmnukohaiaeDSDSM122f6 Q I ______ Oti~S .... ...... In....... l-taloquadratum waesbyq DSM 16790 --- I -oarlaansnoi ATCC 43049 --- Sa s -----------0 0 arhabdus utaensisDSM 1290 ---Q -----*0 a apharaoisDSM2180 ---0 -liga-e-- ------ 0 -- ^ ^*MCM M--------c efehnoocdsbgoB.S 64 -- 0 S ---- l-4 ----- ---0 0 0 Q-m A- -- ----- 0A --0 0 ---A ------. o l ethac mi1 --------40 8ab-.O --------00 *. e i-rsa Son.,osC72A ---0 =,, ,wC- l -k ------ nt ... DUF71-B12 ............ 0 --------- Ba B-ux --------..--. -o Z-b ------------ .w..3 ----0 CbiZ B--- .----- --------------0 o 9---------- ---- -------- ........-0 ---------------- ---o 0 Figure 6 Distribution of DUF71-B12, CbiZ and CbiB in bacterial (A) and archaeal genomes (B). The trees are species tree constructed in iTol (itol.embl.de/), the presence and absence of the specific genes was derived from the "DUF71-B12" subsystem in the SEED database. Additional file 3: Figure 51. Top 10 interactors with YLR143W by homozygous co-sensitivity in S cerevisiae (from the Yeast fitness database http'//ftdbstanfordedu/fitdbcgiquery=YLR143W) Figure S2 Multiple sequence alignment of selected Dph6 family and DUF71-B12 family sequences generated using the Multialin platform (http//multalin toulouseinrafr/multalin/) Strictly conserved residues between the two families are in red Residues conserved only in the Dph6 family are boxed in green Residues found around the phosphate group of ATP are noted by red arrows Secondary structural elements, yellow rectangles for c- helix and cyan arrows for 3-strand, shown above the alignment, are from the crystal structure of P furiosus_Dph6 (PF0828) (PDB id 3RK1) Figure S3 Bayesian tree of archaeal and eukaryotic Dph6 sequences The scale bar represents the average number of substitutions per site Number at nodes represent posterior probabilities For clarity only values greater than 085 are indicated Figure S4 (Top) Sequence logo derived from 95 Dph6 sequences extracted from Diphthamide subsystem in SEED The E188 reside (PF0828 numbering) is located at position 10 in the logo (Bottom) Sequence logo derived of the corresponding region derived from 102 DUF71-B12 sequences extracted from the DUF71-B12 subsystem in SEED Both logos were made at http//weblogo berkeley edu/Iogocgi based on clustalw derived alignments de Crecy-Lagard et al. Biology Direct 2012, 7:32 http://www.biology-direct.com/content/7/1/32 Competing interests The authors) declare that they have no competing interests Authors' contributions VdC-L conducted the comparative genomic analysis and made the functional predictions CB-A performed the phylogenetic analysis FF, LT and JFH did the structural analysis All authors participated in writing/reviewing the manuscript All authors read and approved the final manuscript Reviewers' comments Reviewer number 1: Arcady Mushegian Stowers Institute for Medical Research, 1000 E 50th Street, Kansas City, Missouri 64110 The study by de Crecy-Lagard and co-authors pinpoints the DUF71/ COG2102 asthe most likely archaeal/eukaryotic ATP-dependent diphthine- ammonia ligase,the so far unaccounted-for enzyme in the pathway of diphtamide biosynthesis, which pathway is responsible for the formation of unique derivative of the conserved histidine within the translation elongation factor 2 A distinct subfamily of this protein family appears to play another role in bacteria and a subset of archaea, most likely in the salvage of an intermediate of cobalamine biosynthesis The evidence presented in the paper consists of genome context information, sequence-structure prediction and the data from yeast concerning gene expression and chemical-genomics profiling Taken together, the evidence seems compelling to me The data from yeast represent partial functional validation of predictions made for prokaryotes I would recommend only to tone down the suggestion that all this is a "novel paradigm" in analysis of gene function' researchers have been inferring gene functions from phenotypes, as well as from directly detected changes in genotype, for a long, long time, and the current study is a logical extension of these approaches What is different in the last 15 years is that we can compare these properties across many species with completely sequenced genomes; but even this is a logical extension of the previous work (compare, for example, with work from Yanofskyand Jensen labs on biosynthesis of aromatic amino acids) it was not any prescription of a previous scientific paradigm that constrained the work, but rather the lack of the data Response The references to a "novel paragdim" were eliminated in the abstract and the introduction as suggested Reviewer number 2: Michael Galperin NCBI, NLM, NIH Computational Biology Branch, 8600 Rockville Pike MSC 6075, Building 38A, Room 6N601, Bethesda, MD 20894-6075 The paper by de Crecy-Lagard and colleagues is a fine example of using comparative genomics to patch the remaining holes in the metabolic pathways The key conclusion of this work, prediction of the participation of the members of the DUF71/COG2102 family in diphtamide biosynthesis in archaea and eukaryotes and in B12 metabolism in some bacteria and archaea, is extremely convincing and hardly even needs an experimental verification The second conclusion, that ammonia used in the diphthine ammonia lyase- catalyzed reaction in different organisms could use generated by two different enzymes, asparagine synthetase and the RidA domain, also sounds convincing However, proving beyond reasonable doubt that DUF71/ COG2102 family members with their ATP-pyrophosphatase activity comprise the key part of diphthine ammonia lyase does not prove that they are the only subunits of this enzyme Even if the proposed reaction scheme (Figure 1B) is correct, there still might be a need for a ligase subunit that couple removal of the AMP moiety from EF2 with its amidation There is a definite possibility that DUF71/COG2102 family members catalyze all these individual reactions, eg using its unique C-terminal 100-aa domain, but that would have to be proven experimentally The reported involvement of the likely scaffold protein YBR246w (DPH7) appears to support the idea that diphthine ammonia lyase consists of more than one type of subunits Otherwise, it is a great paper that vividly demonstrates the power of comparative-genomics approaches We added a phrase stating that "even if one cannot rule out at this stage that other catalytic subunits yet to be identified may also be required" Reviewer number 3: L. Aravind NCBI, NLM, NIH Computational Biology Branch, 8600 Rockville Pike MSC 6075, Building 38A, Room 6N601, Bethesda, MD 20894-6075 This work uses contextual information to identify the diphthine-ammonia ligase in archaea and eukaryotes It also shows that the yeast protein YBR246W is indeed not the correct ligase, but rather the MJ0570-like PP-loop ATPases The authors also show that this family has been transferred to certain bacteria where they infer that it is likely to have undergone a functional shift to participate in B12 salvage They cautiously propose that it might function as a replacement for CbiZ to function as an amidohydrolase (the reverse of the typical PP-loop ATPase reaction) as against a ligase The conclusions are definitive and the article makes a useful contribution to the understanding of protein modification and cofactor biosynthesis This said, there are certain issues with the current form of the article that authors necessarily need to address in their revision 1) (pg 8) The authors state that the MJ0570-like enzymes have a HUP domain followed by a distinct C- terminal domain They do not explain the meaning of this properly nor cite the reference of the paper (PMID 12012333) pertaining to the HUP domains where this family was identified as a PP-loop ATPase, along with the observations (Table 1 in that reference) that it has a primarilyarchaeo- eukaryotic phyletic pattern, and that eukaryotic versions might be fused to two C-terminal domains of the YabJ-like chorismate lyase fold (now termed RidA) It should be stated that the N-terminus is a PP-loop ATPase domain of the HUP class of Rossmannoid domains not all HUP domains are ligases - only the PP-loop and the HIGH nucleotidyltransferases This clarifies that it is related to other ATP-utilizing amidoligases such as NAD synthethase, GMP synthetase and asparagine synthetase This would place their inferred amidoligase activity in the context of comparable, known amidoligase activities of related enzymes In fact it would be advisable to place the fact that these are PP-loop enzymes in the abstract itself The following sentence was added "This family had previously been previously described a aa PP-loop ATPase of unknown function containing a Rossmannoid class HUP domain (Aravind et al 2002)" A reference to the PP- loop ATPase family was added in the abstract as requested A reference to the same work was added when talking about the RidA fusion For the phylogenetic distribution the results presented here are a bit different from the previous study because many more genome are available after 10 years and we show that the family is also bacterial 2) The authors persistently refer to the domain as DUF71 This name is no longer current in Pfam and it has long been recognized as mentioned in the reference noted above that these proteins are not "domains of unknown function" but PP-loop ATPases The domain is correctly termed ATP bind 4 (PF01902) in Pfam This Pfam (not the misleading DUF71) name and Pfam number should be indicated with just a statement in the introduction that it was formerly DUF71 This domain is currently called "Domain of unknown function DUF71, ATP- binding domain" in the InterPro database (IPR002761) even if it is called ATP bind 4 (PF01902) in Pfam It is much shorter to use (as well as easier for the reader to follow) the DUF71 abbreviation rather than the ATP_bind_4 abbreviation We therefore prefer to keep DUF71 We however introduced a statement giving the different names of this domain in the InterPro, Pfam and COG databases at the end of the introduction 3) The authors apparently have a misapprehension regarding the Methanohalophilus mahii protein both in the text and the domain architecture rendered in the figure First, these proteins have two N-terminal domains fused tothe MJ0570-like module namely a N terminal class-11 glutamineamidotransferase (GAT-11, e g see PMID 20023723) and second PP loop ATPase domain thereafter (ie one related to asparagine synthetase) This GAT domain as in the case of other PP -loop enzymes could supply ammonia by cleaving it offglutamine But this does not explain which PP-loop domain utilizes it In the case of the Asn-synthetase it is used by the cognate PP-loop domain In this case the presence of two PP -loop domains suggests that it is either utilized by both for different reactions or else the second domain does not receive the NH3 from this GAT This also leads to the question what reaction is the Asn synthetase like PP-loop domain catalyzing? Page 11 of 13 de Crecy-Lagard et al. Biology Direct 2012, 7:32 http://www.biology-direct.com/content/7/1/32 Quality of written English Acceptable The source of the confusion came from the fact that the Asn Synthase domain (AsnB) contains two domains the GAT 11 domain and the Asn_Synthase_B_C PP-Loop ATPase domain Both the figure and the text were modified to avoid the confusion Based on the reviewer's comments the sentence discussing the potential role of the AsnB domain was modified as follows "This domain organization strongly suggests that in this subset of enzymes, the hydrolysis of glutamine catalyzed by the fused GAT 11 domain could provide the NH3 moiety to both the DUF71 and the Asn_Synthase_B_C enzymes" 4) Based on phyletic complementarity the authors suggest that bacterial CbiZ might be displaced by the bacterial MJ0570-like enzymes This seems unusual Why utilize a PP-loop ATPase for the reverse reaction, i e amidohydrolase? Typically there is little overlap between the families involved in amidohydrolase as opposed to ATP-dependent ligase activity Of the almost 12 distinct major inventions of amidoligase activity, hardly any representatives of these superfamilies have been reused as amidohydrolases So do the authors note anything special in the case of the bacterial representatives that might support such a functional shift? This hypothesis is derived from phylogenetic distribution and it is not unprecedented that ligases and hydrolases are found in the same family (see example in PMID'12359880) However, we agree that this hypothesis derives mainly from phylogenetic patterns analysisand beyond the differences in the predicted substrate binding pocket found in the DUF71-B12 family we did not identify specify changes that could point to a shift to hydrolase, hence our caution in our prediction as stated in the text Quality of written English Acceptable Acknowledgements This work was supported by the US National Science Foundation (grant MCB-1153413 to V dC-L), the US National Institutes of Health (grant U54GM094597 to GT Montelione and the Northeast Structural Genomics Consortium) and the Agence Nationale pour la Recherche (grant ANR 10 BINF-01 0127 Ancestrome) to C B-A We thank Raffael Schaffrath and Mike Stark for sharing for sharing unpublished diphthamide related data and critical evaluation of manuscript parts We thank for Jorge Escalante Semerena for sharing his immense knowledge on B12 salvage pathways, Diana Downs for disclosing unpublished results on RidA function, Manal Swairjo for chemical insight, and Andrew Hanson for helpful input on the manuscript Author details Department of Microbiology and Cell Science, University of Florida, Gainesville, FL 32611, USA 2Department of Biological Sciences, Columbia University, Northeast Structural Genomics Consortium, 1212 Amsterdam Ave, New York, NY 10027, USA 3Universite de Lyon; Universite Lyon 1; CNRS; UMR5558, Laboratoire de Biometrie et Biologie Evolutive, 43 boulevard du 11 Novembre 1918, Lyon, Villeurbanne F-69622, France Received: 17 July 2012 Accepted: 18 September 2012 Published: 26 September 2012 References 1 Greganova E, Altmann M, Butikofer P Unique modifications of translation elongation factors. FEBS J 2011, 278(15)'2613-2624 2 Van Ness BG, Howard JB, Bodley JW ADP-ribosylation of elongation factor 2 by diphtheria toxin. Isolation and properties of the novel ribosyl-amino acid and its hydrolysis products. J Bio Chem 1980, 255(22)'10717-10720 3 Van Ness BG, Howard JB, Bodley JW ADP-ribosylation of elongation factor 2 by diphtheria toxin. NMR spectra and proposed structures of ribosyl-diphthamide and its hydrolysis products. J Bio Chem 1980, 255(22)'10710-10716 4 Zhang Y, Zhu X, Torelli AT, Lee M, Dzikovski B, Koralewski RM, Wang E, Freed J, Krebs C, Ealick SE, et ao/ Diphthamide biosynthesis requires an organic radical generated by an iron-sulphur enzyme. Nature 2010, 465(7300)'891-896 5 Zhu X, Dzikovski B, Su X, Torelli AT, Zhang Y, Ealick SE, Freed JH, Lin H' Mechanistic understanding of Pyrococcus horikoshii Dph2, a [4Fe-4S] enzyme required for diphthamide biosynthesis. Mol Biosyst 2011, 7(1)74-81 6 Lartillot N, Lepage T, Blanquart S PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating. Bioinformatics 2009, 25(17) 2286-2288 7 McRee DE XtalView/Xfit-A versatile program for manipulating atomic coordinates and electron density. J Struct Biol 1999, 125(2-3) 156-165 8 Katoh K, Misawa K, Kuma KlI, Miyata T MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Adds Res 2002, 30(14) 3059-3066 9 Webb TR, Cross SH, McKie L, Edgar R, Vizor L, Harrison J, Peters J, Jackson IJ Diphthamide modification of eEF2 requires a J-domain protein and is essential for normal development. J Cell Sc 2008, 121(19) 3140-3145 10 Zhu X, Kim J, Su X, Lin H Reconstitution of diphthine synthase activity in vitro. Biochemistry 2010, 49(44)9649-9657 11 Mattheakis LC, Shen WH, Collier RJ' DPH5, a methyltransferase gene required for diphthamide biosynthesis in Saccharomyces cerevisiae. Mol Cell Biol 1992, 12(9)4026-4037 12 Moehring TJ, Danley DE, Moehring JM In vitro biosynthesis of diphthamide, studied with mutant Chinese hamster ovary cells resistant to diphtheria toxin. Mol Cell Bio 1984, 4(4) 642-650 13 Su X, Chen W, Lee W, Jiang H, Zhang S, Lin H YBR246W is required for the third step of diphthamide biosynthesis. J Am Chem Soc 2011, 134(2) 773-776 14 Hunter S, Jones P, Mitchell A, Apweiler R, Attwood TK, Bateman A, Bernard T, Binns D, Bork P, Burge S, et oal InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acds Res 2012, 40(D1)'D306-D312 15 Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K, et al The Pfam protein families database. Nucleic Acds Res 2010, 38(suppl_1) D211 -D222 16 Tatusov R, Fedorova N, Jackson J, Jacobs A, KiryutinB Ko E, Krylov D, Mazumder R, Mekhedov S, Nikolskaya A, et o' The COG database: an updated version includes eukaryotes. BMC Bioinformo 2003, 4(1)41 17 Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25(17)'3389-3402 18 Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, et oal Clustal W and Clustal X version 2.0. Bioinformatics 2007, 23(21)'2947-2948 19 Corpet F Multiple sequence alignment with hierarchical clustering. Nucleic Ads Res 1988, 16(22)'10881-10890 20 Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY, Cohoon M, de Crecy-Lagard V, Diaz N, Disz T, Edwards R, et oal The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Ads Res 2005, 33(17)'5691-5702 21 Markowitz VM, Chen I-MA, Palaniappan K, Chu K, Szeto E, Grechkin Y, Ratner A, Anderson I, Lykidis A, Mavromatis K, et al The integrated microbial genomes system: an expanding comparative analysis resource. Nucleic Acids Res 2010, 38(suppl 1)'D382-D390 22 Aim EJ, Huang KH, Price MN, Koche RP, Keller K, Dubchak IL, Arkin AP The MicrobesOnline web site for comparative genomics. Genome Res 2005, 15(7)'1015-1022 23 Hibbs MA, Hess DC, Myers CL, Huttenhower C, Li K, Troyanskaya OG Exploring the functional landscape of gene expression: directed search of large microarray compendia. Bioinformotics 2007, 23(20) 2692-2699 24 Cherry JM, Hong EL, Amundsen C, Balakrishnan R, Binkley G, Chan ET, Christie KR, Costanzo MC, Dwight SS, Engel SR, et al Saccharomyces genome database: the genomics resource of budding yeast. Nucleic Acids es 2012, 40(D1)'D700-D705 25 Hillenmeyer M, Ericson E, Davis R, Nislow C, Koller D, Giaever G Systematic analysis of genome-wide fitness data in yeast reveals novel gene function and drug action. Genome Biol 2010, 11 (3) R30 26 Hillenmeyer ME, Fung E, Wildenhain J, Pierce SE, Hoon S, Lee W, Proctor M, StOnge RP, Tyers M, Koller D, et oa The chemical genomic portrait of yeast: uncovering a phenotype for all genes. Science 2008, 320(5874)'362-365 27 Letunic I, Bork P Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformtics 2007, 23(1)'127-128 28 Crooks GE, Hon G, Chandonia J-M, Brenner SE WebLogo: a sequence logo generator. Genome Res 2004, 14(6)'1188-1190 Page 12 of 13 de Crecy-Lagard et al. Biology Direct 2012, 7:32 Page 13 of 13 http://www.biology-direct.com/content/7/1/32 29 Pruitt KD, Tatusova T, Brown GR, Maglott DR NCBI Reference sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res 2012, 40(D1)'D130-D135 30 Philippe H MUST, a computer package of management utilities for sequences and trees. Nucleic Acids Res 1993, 21(22) 5264-5272 31 Gouy M, Guindon S, Gascuel 0 SeaView Version 4: a multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Mol Bio Evol 2010, 27(2)'221-224 32 Aravind L, Anantharaman V, Koonin EV Monophyly of class I aminoacyl tRNA synthetase, USPA, ETFP, photolyase, and PP-ATPase nucleotide-binding domains: implications for protein evolution in the RNA world. Proteins Structure, Funct Bioinform 2002, 48(1 )1-14 33 Forouhar F, Saadat N, Hussain M, Seetharaman J, Lee I, Janjua H, Xiao R, Shastry R, Acton TB, Montelione GT, et oa A large conformational change in the putative ATP pyrophosphatase PF0828 induced by ATP binding. Acto Crystallogr Sect F Struct Bio Cryst Commun 2011, 67(11 )1323-1327 34 Botet J, Rodriguez-Mateos M, Ballesta JPG, Revuelta JL, Remacha M A chemical genomic screen in Saccharomyces cerevisiae reveals a role for diphthamidation of translation Elongation Factor 2 in inhibition of protein synthesis by Sordarin. Antimicrob Agents Chemother 2008, 52(5) 1623-1629 35 Bar C, Zabel R, Liu S, Stark MJR, Schaffrath R A versatile partner of eukaryotic protein complexes that is involved in multiple biological processes: Ktil l/Dph3. Mol Microbiol 2008, 69(5)'1221-1233 36 Jorgensen R, Wang Y, Visschedyk D, Merrill AR The nature and character of the transition state for the ADP-ribosyltransferase reaction. EMBO Rep 2008, 9(8)'802-809 37 lyer LM, Abhiman S, Maxwell Burroughs A, Aravind L Amidoligases with ATP-grasp, glutamine synthetase-like and acetyltransferase-like domains: synthesis of novel metabolites and peptide modifications of proteins. Mol BioSys 2009, 5(12)1636-1660 38 Lambrecht JA, Flynn JM, Downs DM Conserved YjgF protein family deaminates reactive enamine/imine intermediates of Pyridoxal 5'- Phosphate (PLP)-dependent enzyme reactions. J Bio/ Chem 2012, 287(5) 3454-3461 39 Brochier-Armanet C, Forterre P, Gribaldo S Phylogeny and evolution of the Archaea: one hundred genomes later. Curr Opin Microbiol 2011, 14(3) 274-281 40 Berths EL, Poolman B, Hvorup RN, Locher KP, Rees DC In vitro functional characterization of BtuCD-F, the Escherichia coli ABC transporter for vitamin B12 uptake. Biochemistry 2005, 44(49)'16301-16309 41 Zayas CL, Claas K, Escalante-Semerena JC The CbiB protein of Salmonella enterica is an integral membrane protein involved in the last step of the de novo corrin ring biosynthetic pathway. J Bacterial 2007, 189(21 )7697-7708 42 Escalante-Semerena JC Conversion of cobinamide into adenosylcobamide in bacteria and archaea. J Bacteriol 2007, 189(13) 4555-4560 43 Woodson JD, Escalante-Semerena JC CbiZ, an amidohydrolase enzyme required for salvaging the coenzyme B12 precursor cobinamide in archaea. Proc Not Acod Sci USA 2004, 101(10) 3591-3596 44 Gray MJ, Escalante-Semerena JC The cobinamide amidohydrolase (cobyric acid-forming) CbiZ enzyme: a critical activity of the cobamide remodelling system of Rhodobacter sphaeroides. Mol Microbiol 2009, 74(5) 1198-1210 doi:10.1186/1745-6150-7-32 Cite this article as: de Crecy Lagard et al Comparative genomic analysis of the DUF71/COG2102 family predicts roles in diphthamide biosynthesis and B12 salvage. Biology Direct 2012 7'32 Submit your next manuscript to BioMed Central and take full advantage of: * Convenient online submission * Thorough peer review * No space constraints or color figure charges * Immediate publication on acceptance * Inclusion in PubMed, CAS, Scopus and Google Scholar * Research which is freely available for redistribution Submit your manuscript at nt www.biomedcentral.com/submit 0 BiIloid Central |
Full Text |
PAGE 1 Supplemental Figure 1 = DPH7 = DUF71 = DPH4Top 10 interactorswith YLR143W by homozygous co-sensitivity in S. cerevisiae(from the Yeast fitness database http://fitdb.st anford.edu/fitdb.cgi?query=YLR143W). PAGE 2 Multiple sequence alignment of select ed Dph6 family and DUF71-B12 family se quences generated using the Multialin platform ( http://multalin.toulouse.inra.fr/multalin/ ) Strictly conserved residues between the two families are in red. Residues conserved only in the Dph6 family ar e boxed in green. Residues found around the phosphate gr oup of ATP are noted by red arrows. Secondary structural elements, yellow rectangles for -helix and cyan arrows for -strand, shown above the alignment, are from the crystal structure of P. furiosus_D ph6 (PF0828) (PDB id: 3RK1).Supplemental Figure 2 DUF71-Signature motif PAGE 3 Supplemental Figure 3 0.1 1 1 1 0.96 1 1 0.97 1 1 0.98 1 0.87 1 1 1 1 1 1 1 1 1 1 1 0.98 0.98 1 1 Archaeoglobus veneficus SNP6 YP_004341662 Archaeoglobus fulgidus DSM 4304 NP_070494 Thermococcus barophilus MP YP_004071328 Pyrococcus furiosus DSM 3638 NP_578557 Candidatus Caldiarchaeum subterraneum BAJ48107 Pyrobaculum oguniense TE7 YP_005259944 Pyrobaculum aerophilum str. IM2 NP_560174 Methanopyrus kandleri AV19 NP_613929 Aciduliprofundum boonei T469 YP_003483464 Picrophilus torridus DSM 9790 YP_023745 Methanococcus voltae A3 YP_003708306 Methanocaldococcus jannaschii DSM 2661 NP_247549 Methanosphaera stadtmanae DSM 3091 YP_447605 Methanothermobacter t hermautotrophicus st r. Delta H NP_275575 uncultured marine group II euryarchaeote EHR76673 halophilic archaeon DL31 YP _004806933 Haloterrigena turkmenica DSM 5511 YP_003402914 Methanocella arvoryzae MRE50 YP_687132 Methanocella conradii HZ254 YP_005381419 Thermofilum pendens Hrk 5 YP_920509 Methanospirillum hungatei JF-1 YP_503034 Methanosaeta thermophila PT YP_843767 Methanoplanus petrolearius DSM 11571 YP_003894468 Methanoregula boonei 6A8 YP_001404153 Methanosalsum zhilinae DSM 4017 YP_004615790 Methanohalobium evestigatum Z-7303 YP_003727184 Methanococcoides burtonii DSM 6242 YP_565201 Cenarchaeum symbiosum A YP_875359 Nitrosopumilus maritimus SCM1 YP_001581864 Candidatus Nitrosoarchaeum limnia SFB1 ZP_08256363 Candidatus Nitrosoarchaeum koreensis MY1 ZP_08667550 Ignisphaera aggregans DSM 17230 YP_003859822 Staphylothermus marinus F1 YP_001041170 Caldivirga maquilingensis IC-167 YP_001541343 Metallosphaera sedula DSM 5348 YP_001192089 Sulfolobus solfataricus P2 NP_342190 Tetrahymena thermophila XP_001016762 Paramecium tetraurelia strain d4-2 XP_001444982 Plasmodium falciparum 3D7 XP_001350622 Cryptosporidium parvum Iowa II XP_625666 Homo sapiens NP_542381 Drosophila melanogaster NP_572749 Arabidopsis thaliana NP_187098 Chlamydomonas reinhardtii XP_001690073 Saccharomyces cerevisiae S288c NP_013244 Yarrowia lipolytica CLIB122 XP_505418 Trypanosoma brucei brucei st rain 927 4 GUTat10.1 XP_828128 Leishmania major strain Friedlin XP_001683782 Phaeodactylum tricornutum CCAP_1055 1 XP_002180405 Thalassiosira pseudonana CCMP1335 XP_002292404 1 0.88 Alveolata Metazoa Plantae Fungi Stramenopiles Excavata Archaeoglobales Thermococcales Thermoproteales Thermoplasmatales/DHEV2 Methenococcales Methanobacteriales Halobacteriales Methanocellales Sulfolobales Methanomicrobia Thermoproteales Desulfurococcales Thaumarchaeota Group II Aigarchaeota Methanopyrales Bayesian tree of archaealand eukar yotic Dph6 sequences. The scale bar represents the average number of s ubstitutions per site. Number at nodes represent posterior probabilities. For clarity only values greater than 0.85 are indicated. PAGE 4 (Top) Sequence logo derived from 95 Dph6 sequences extracted from Diphthamide subsystem in SEED. The E188 reside (PF0828 numbering) is located at position 10 in the logo. (Botto m) Sequence logo derived of the corresponding region derived from 102 DUF71-B12 s equences extracted from the DUF71-B12 subsystem in SEED. Both logos were made at http://weblogo.berkeley.edu/logo.cgi based on clustalwderived alignments. Supplemental figure 4 !DOCTYPE art SYSTEM 'http:www.biomedcentral.comxmlarticle.dtd' ui 1745-6150-7-32 ji 1745-6150 fm dochead Research bibl title p Comparative genomic analysis of the DUF71/COG2102 family predicts roles in diphthamide biosynthesis and B12 salvage aug au id A1 ca yes snm de Crécy-Lagardfnm Valérieinsr iid I1 email vcrecy@ufl.edu A2 ForouharFarhadI2 farhadf@biology.columbia.edu A3 Brochier-ArmanetCélineI3 celine.brochier-armanet@univ-lyon1.fr A4 TongLiangltong@columbia.edu A5 Huntmi FJohnfhunt1@gmail.com insg ins Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, 32611, USA Department of Biological Sciences, Columbia University, Northeast Structural Genomics Consortium, 1212 Amsterdam Ave, New York, NY, 10027, USA Université de Lyon; Université Lyon 1; CNRS; UMR5558, Laboratoire de Biométrie et Biologie Evolutive, 43 boulevard du 11 Novembre 1918, Lyon, Villeurbanne, F-69622, France source Biology Direct section Genomics, bioinformatics and systems biologyissn 1745-6150 pubdate 2012 volume 7 issue 1 fpage 32 url http://www.biology-direct.com/content/7/1/32 xrefbib pubidlist pubid idtype doi 10.1186/1745-6150-7-32pmpid 23013770 history rec date day 17month 7year 2012acc 1892012pub 2692012 cpyrt 2012collab de Crécy-Lagard et al.; licensee BioMed Central Ltd.note This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. kwdg kwd Diphthamide Vitamin B12 Amidotransferase Comparative genomics abs sec st Abstract Background The availability of over 3000 published genome sequences has enabled the use of comparative genomic approaches to drive the biological function discovery process. Classically, one used to link gene with function by genetic or biochemical approaches, a lengthy process that often took years. Phylogenetic distribution profiles, physical clustering, gene fusion, co-expression profiles, structural information and other genomic or post-genomic derived associations can be now used to make very strong functional hypotheses. Here, we illustrate this shift with the analysis of the DUF71/COG2102 family, a subgroup of the PP-loop ATPase family. Results The DUF71 family contains at least two subfamilies, one of which was predicted to be the missing diphthine-ammonia ligase (EC 6.3.1.14), Dph6. This enzyme catalyzes the last ATP-dependent step in the synthesis of diphthamide, a complex modification of Elongation Factor 2 that can be ADP-ribosylated by bacterial toxins. Dph6 orthologs are found in nearly all sequenced Archaea and Eucarya, as expected from the distribution of the diphthamide modification. The DUF71 family appears to have originated in the Archaea/Eucarya ancestor and to have been subsequently horizontally transferred to Bacteria. Bacterial DUF71 members likely acquired a different function because the diphthamide modification is absent in this Domain of Life. In-depth investigations suggest that some archaeal and bacterial DUF71 proteins participate in B12 salvage. Conclusions This detailed analysis of the DUF71 family members provides an example of the power of integrated data-miming for solving important “missing genes” or “missing function” cases and illustrates the danger of functional annotation of protein families by homology alone. Reviewers’ names This article was reviewed by Arcady Mushegian, Michael Galperin and L. Aravind. bdy Background In both Archaea and Eucarya, the translation Elongation Factor 2 (EF-2) harbors a complex post-translational modification of a strictly conserved histidine (Hissub 699 in yeast) called diphthamide abbrgrp abbr bid B1 1 . This modification is the target of the diphtheria toxin and the it Pseudomonas exotoxin A, which inactivate EF-2 by ADP-ribosylation of the diphthamide B2 2 B3 3 . Although the diphthamide biosynthesis pathway was described in the early 1980′s 2 3 , the corresponding enzymes have only recently been characterized. In vitro reconstitution experiments have shown that the first step, the transfer of a 3-amino-3-carboxypropyl (ACP) group from S-adenosylmethionine (SAM) to the C-2 position of the imidazole ring of the target histidine residue, is catalyzed in Archaea by the iron-sulfur-cluster enzyme, Dph2 B4 4 B5 5 (Figure figr fid F1 1A). Genetic and complementation studies have shown that the catalysis of the same first step requires four proteins (Dph1-Dph4) in yeast and other eukaryotes B6 6 B7 7 B8 8 B9 9 . The subsequent step, trimethylation of an amino group to form the diphthine intermediate, is catalyzed by diphthine synthase, Dph5 (EC 2.1.1.98) (Figure 1A) B10 10 B11 11 . The last step, the ATP-dependent amidation of the carboxylate group B12 12 , is catalyzed by diphthine-ammonia ligase (EC 6.3.1.14), but the corresponding gene has not been identified (http://www.orenza.u-psud.fr/). A protein involved in this last step was recently identified in yeast (YBR246W or Dph7), but it is most certainly not directly involved in catalysis as it is not conserved in Archaea and it contains a WD-domain likely to be involved in protein/protein interactions B13 13 . fig Figure 1caption Structures of diphthamide and B12 precursors and derivatives. (A) The core diphthamide pathway is predicted to contain three enzymes Dph2, Dph5 and Dph6 in Archaea.text b Structures of diphthamide and B12 precursors and derivatives. (A) The core diphthamide pathway is predicted to contain three enzymes Dph2, Dph5 and Dph6 in Archaea. The formation of diphthine has been reconstituted in vitro using Dph2 and Dph5 from Pyrococcus horikoshii 45. The enzyme family catalyzing the last step in Archaea and Eukarya Dph6 was missing. In yeast, the first and last steps require additional proteins (Dph1, Dph3 and Dph7). (B) Predicted Dph6-catalyzed reactions. (C) Ado-Pseudo-B12 structure and hydrolysis site by the bacterial CbiZ enzyme (bCbiZ). Parts (A) and (B) are adapted with permission from Xuling Zhu; Jungwoo Kim; Xiaoyang Su; Hening Lin; Biochemistry 2010, 49, 9649–9657. Copyright 2010 American Chemical Society. graphic file 1745-6150-7-32-1 Using a combination of comparative genomic approaches, we set out to identify a candidate gene for this orphan enzyme family. Based on taxonomic distribution, domain organization of gene fusions, physical clustering on chromosomes, atomic structural data, co-expression, and phenotype data, a promising candidate was identified, the family called Domain of Unknown Function family DUF71(IPR002761) in Interpro B14 14 . This family is also called ATP_bind_4 (PF01902) in Pfam B15 15 or Predicted ATPases of PP-loop superfamily (COG2102) in the Cluster of Ortholous Group database B16 16 . However, detailed analysis of the DUF71 family revealed that this family is almost surely not isofunctional. Some Archaea contain two very divergent copies of the gene, while homologs are found in Bacteria, which are known to lack diphthamide. This observation suggests that some DUF71 members have different functions and probably participate in different biochemical pathways. Methods Comparative genomics The BLAST tools B17 17 and resources at NCBI (http://www.ncbi.nlm.nih.gov/) were routinely used. Multiple sequence alignments were built using ClustalW B18 18 or Multialin B19 19 . Protein domain analysis was performed using the Pfam database tools (http://pfam.janelia.org/) 15 . Analysis of the phylogenetic distribution and physical clustering was performed in the SEED database B20 20 . Results are available in the “Diphthamide biosynthesis” and “DUF71-B12” subsystem on the public SEED server (http://pubseed.theseed.org/SubsysEditor.cgi). Phylogenetic profile searches were performed on the IMG platform B21 21 using the phylogenetic query tool (http://img.jgi.doe.gov/cgi-bin/w/main.cgi?section=PhylogenProfiler&page=phyloProfileForm). Physical clustering was analyzed with the SEED subsystem coloring tool or the Seedviewer Compare region tool 20 as well as on the MicrobesOnline (http://www.microbesonline.org/) tree based genome browser B22 22 . The SPELL microarray analysis resource B23 23 was used through the Saccharomyces Genome Database (SGD) (http://www.yeastgenome.org/) B24 24 to analyze yeast gene coexpression profiles. Clustering of yeast deletion mutants based on phenotype analysis was analyzed through the yeast fitness database available at http://fitdb.stanford.edu/ B25 25 B26 26 . Mapping of gene distribution profile to taxonomic trees were generated using the iTOL suite (http://itol.embl.de/index.shtml) B27 27 . Sequence logos were derived using the WebLogo platform B28 28 . Structure analysis Visualization and comparison of protein structures and manual docking of ligand molecules were performed using PyMol (The PyMOL Molecular Graphics System, Version 1.4.1, Schrödinger, LLC). XtalView 7 was used for the protein docking exercises. Phylogenetic analyses The survey of the 1996 complete prokaryotic genomes available at the NCBI (http://www.ncbi.nlm.nih.gov/) using BLASTP 17 (default parameters) allowed identification of 119 bacterial and 144 archaeal DUF71 homologs in addition to the 182 eukaryotes homologs identified in the RefSeq database at the NCBI B29 29 (Additional file supplr sid S1 1: Table S1). The retrieved sequences were aligned using MAFFT 8 and the resulting alignment was visually inspected using ED, the alignment editor of the MUST package B30 30 . The phylogenetic analysis of the 445 sequence was performed using the neighbor-joining distance method implemented in SeaView B31 31 . The robustness of the resulting tree was assessed by the non-parametric bootstrap method (100 replicates of the original dataset) implemented in SeaView. A second phylogenetic analysis restricted to 50 archaeal and eukaryotic homologs representative of the genetic and genomic diversity of these two Domains was performed using the Bayesian approach implemented in Phylobayes 6 with a LG model. suppl Additional file 1 Table S1. Genbank RefSeq identities and corresponding organisms for all proteins used in the phylogenies. name 1745-6150-7-32-S1.xlsx Click here for file Results and discussion Comparative genomics points to DUF71/COG2102 as a strong candidate for the missing diphthamide synthase family The distribution of known diphthamide biosynthesis genes in Archaea was analyzed using the SEED database and its tools 20 . The 59 archaeal genomes analyzed all contained an EF-2 encoding gene. Analysis of the distribution of Dph2 and Dph5 in Archaea showed that 58/59 genomes encoded these two proteins. The only archaeon lacking both Dph2 and Dph5 was Korarchaeum cryptofilum OPF8 (Figure F2 2A). We therefore hypothesized that this organism has lost the diphthamide modification pathway even if the K. cryptofilum EF-2 still harbors the conserved His residue at the site of the modification (His603 in the K. cryptofilum sequence, Accession B1L7Q0 in UniprotKB). Using the IMG/JGI phylogenetic query tools 21 , we searched for protein families found in all Archaea except Korarchaeum cryptofilum OPF8, present in Saccharomyces cerevisiae and Homo sapiens but absent in Escherichia coli and Bacillus subtilis, as bacteria are known to lack this modification pathway. Only one family, DUF71/COG2102, followed this taxonomic distribution. This family had been described previously as a PP-loop ATPase of unknown function containing a Rossmannoid class HUP domain B32 32 . Figure 2Comparative genomic analysis of the DUF71 family. (A) Distribution of the core diphthamide genes Dph2 and Dph5 and of EF-2 and DUF71 in Archaea, according to data derived from the “Diphthamide biosynthesis“ subsystem in the SEED database. Comparative genomic analysis of the DUF71 family. (A) Distribution of the core diphthamide genes Dph2 and Dph5 and of EF-2 and DUF71 in Archaea, according to data derived from the “Diphthamide biosynthesis“ subsystem in the SEED database. The tree is a species tree constructed in iTol (itol.embl.de/). The presence and absence of the specific genes was derived from the “Diphthamide biosynthesis“ subsystem. (B) Physical clustering of DUF71/COG2102 genes with Dph5 in three Methanosarcina genomes derived from the MicrobesOnline database (http://www.microbesonline.org/). (C) Examples of proteins containing domains fused to DUF71 in Archaea and Eucarya. Accession numbers and COG, CDD, or Pfam domain numbers are given in parentheses. 1745-6150-7-32-2 Using the neighborhood analysis tool of the SEED database 20 , physical clustering was generally not observed between the dph2, dph5 and DUF71 genes except in three Methanosarcina genomes where the dph5 is located in the vicinity of DUF71 genes (Figure 2B). If members of the DUF71 catalyze the last step of diphthamide synthesis they should bind ATP 12 . Structural analysis of the DUF71 protein from Pyrococcus furiosus (PF0828) reveals the presence of two distinct domains: an N-terminal HUP domain that contains a highly conserved PP-motif that interacts with ATP (PDB id: 3RK1) and AMP (PDB id: 3RK0), and a C-terminal 100-residue domain belonging to a novel fold with a highly conserved motif GEGGEF/YE188T/S (P. furiosus numbering) that is probably involved in substrate binding and recognition B33 33 . Coexpression, phenotype and structural data link the yeast DUF71 to translation and diphthamide biosynthesis YLR143w is the only S. cerevisiae DUF71 family member. Using YLR143w as input in the SPELL co-expression query tool 23 showed that nearly all co-expressed genes were involved in translation and ribosome biogenesis (Additional file S2 2: Table S2). This observation suggested that the DUF71 protein family has a role in translation as expected for a protein modifying EF-2. Like all known diphthamide synthesis genes, YLR143w is also not essential. More specifically, deletion of any of the five known diphthamide genes confers sordarin resistance in yeast B34 34 B35 35 and ylr143wΔ strain was shown to be as resistant to this compound as the diphthamide deficient strains (see supplemental data in 34 ). Furthermore, in a recent complete analysis of relationships between gene fitness profiles (co-fitness) and drug inhibition profiles (co-inhibition) from several hundred chemogenomic screens in yeast 25 26 available at http://fitdb.stanford.edu/, it was found that among the top ten interactors with YLR143w by homozygous co-sensitivity are DPH5, DPH2, DPH4 (or JJJ3) and the newly identified DPH7 (or YBR246w) (Additional file S3 3: Figure S1). Both the coexpression and phenotype data thereby strongly support the hypothesis that YLR143w catalyzes the missing last step of diphthamide biosynthesis, even if one cannot rule out at this stage that other catalytic subunits yet to be identified may also be required. Additional file 2 Table S2. GO Term Enrichment Spell analysis (http://imperio.princeton.edu:3000/yeast) with YLR143w as input. 1745-6150-7-32-S2.xlsx Click here for file Additional file 3 Figure S1. Top 10 interactors with YLR143W by homozygous co-sensitivity in S. cerevisiae (from the Yeast fitness database http://fitdb.stanford.edu/fitdb.cgi?query=YLR143W). Figure S2 Multiple sequence alignment of selected Dph6 family and DUF71-B12 family sequences generated using the Multialin platform (http://multalin.toulouse.inra.fr/multalin/) Strictly conserved residues between the two families are in red. Residues conserved only in the Dph6 family are boxed in green. Residues found around the phosphate group of ATP are noted by red arrows. Secondary structural elements, yellow rectangles for α-helix and cyan arrows for β-strand, shown above the alignment, are from the crystal structure of P. furiosus_Dph6 (PF0828) (PDB id: 3RK1). Figure S3 Bayesian tree of archaeal and eukaryotic Dph6 sequences. The scale bar represents the average number of substitutions per site. Number at nodes represent posterior probabilities. For clarity only values greater than 0.85 are indicated. Figure S4 (Top) Sequence logo derived from 95 Dph6 sequences extracted from Diphthamide subsystem in SEED. The E188 reside (PF0828 numbering) is located at position 10 in the logo. (Bottom) Sequence logo derived of the corresponding region derived from 102 DUF71-B12 sequences extracted from the DUF71-B12 subsystem in SEED. Both logos were made at http://weblogo.berkeley.edu/logo.cgi based on clustalw derived alignments. 1745-6150-7-32-S3.pdf Click here for file Finally, comparison of ATP- and AMP-containing structures of PF0828 reveals that the active site of the former has a narrow groove at the end of which only the α-phosphate of ATP is exposed to the solvent whereas the active site of the latter is wide open (Figure F3 3A and B). Also, there is a sharp turn at the α-phosphate of ATP, suggesting that it is the site of the nucleophilic attack. We therefore performed a docking exercise using the EF-2 structure (PDB id: 3B82) B36 36 with the ATP-containing structure of PF0828. The docking revealed that the active site groove of the ATP-containing structure can easily accommodate diphthine with a few minor clashes between the two structures (Figure 3A and B). Figure 3Structural analysis of the DUF71 (PF0828) putative active site. (A) Docking of modified EF-2 (cyan, PDB id: 3B82) onto ATP-bound structure of PF0828 (yellow, PDB id: 3RK1). Structural analysis of the DUF71 (PF0828) putative activesite. (A) Docking of modified EF-2 (cyan, PDB id: 3B82) onto ATP-bound structure of PF0828 (yellow, PDB id: 3RK1). ATP and several residues of PF0828 (DUF71), which are conserved among archaeal and eukaryotic orthologs, and diphthine of EF-2 (see text for details) are shown in stick models. (B) Close-up stereo pair of panel A. Diphthine of EF-2 and the side chains of conserved residues of PF0828, at the interface of PF0828 and EF-2, are shown in stick models and labeled. (C) Stereo pair view of ATP-binding region of PF0828. Residues that are conserved among Dph6 and DUF71-B12 families are depicted in stick models with carbon atoms in cyan, while the residues that are specific to Dph6 family are shown in stick models with carbon atoms in green. Oxygen and nitrogen atoms are shown in red and blue in all stick models, respectively. 1745-6150-7-32-3 The modeling also showed that the carboxyl group of diphthine resides near the α-phosphate of ATP and carboxylate group of residue Glu188, suggesting that nucleophilic attack by diphthine on the α-phosphate of ATP is highly feasible (Figure 3B). As shown in Figure 3B, the modelling also shows that several residues which are highly conserved among archaeal and eukaryotic PF0828 and YLR143w orthologs beside E188, including S44, Y45, E78, Y103, Q104, A149, E183 and E186 (Additional file 3: Figure S2), are at the interface of the modelled complex of PF0828 with EF-2, supporting the hypothesis that they play important roles in EF-2 recognition (Figure 3B). Linking DUF71 family members to ammonia transfer reactions The diphthine ammonia lyase reaction requires a source of NH3 12 . Domain fusions involving members of the DUF71 family in the Pfam database 15 suggests the source of NH3 might vary depending on the organism. For example, in a few Archaea (e.g. Methanohalophilus mahii DSM 5219, Methanosalsum zhilinae DSM 4017 or ‘Candidatus Nanosalinarum sp. J07AB56′), a COG0367/AsnB asparagine synthetase [glutamine-hydrolyzing] (EC 6.3.5.4) domain is found at the N-terminus of the DUF71 domain (Figure 2C). This AsnB domain can be further separated into two subdomains, an N-terminal class-II glutamine amidotransferase domain (GAT-II) B37 37 and an Asn_Synthase_B_C PP-loop ATPase domain (Figure 2C) This domain organization suggests that in this subset of enzymes, the hydrolysis of glutamine catalyzed by the GAT-II domain could provide the NH3 moiety to both the DUF71 and the Asn_Synthase_B_C enzymes. On the other hand, in many eukaryotes such as yeast and Arabidopsis thaliana, two YjgF-YER057c-UK114-like domains are fused to the C-terminus of the DUF71 protein as previously noted by Aravind et al. 32 (Figure 2C). The stand-alone members of the YjgF-YER057c-UK114 family, now called the RidA family (for reactive intermediate/imine deaminase A), have been shown to deaminate products generated by PLP-dependent enzymes, which results in the release of NH3 B38 38 . The RidA domains fused to DUF71 could therefore be involved in providing the NH3 ammonium moiety for diphthamide synthesis. The Duf71 family is not monofunctional The taxonomic distribution of DUF71 homologs in available complete genomes confirmed that DUF71 is present in one or occasionally two copies in all Archaea except the korarchaeon K. cryptofilum (Table tblr tid T1 1 and Additional file 1: Table S1). This pattern is consistent with an ancient origin of the DUF71 gene in Archaea. In sharp contrast, DUF71 is sporadically distributed in Bacteria, being present only in a few representatives of some phyla (Table 1 and Additional file 1: Table S1). This pattern fits either with an ancient origin of DUF71 in Bacteria followed by numerous losses or, conversely, with a more recent acquisition followed by horizontal gene transfer (HGT) among bacterial lineages. To further investigate the evolutionary history of DUF71, we made a phylogenetic analysis of the homologs identified in the three Domains of Life. The resulting tree showed two divergent groups of sequences. The first group contains the eukaryotic and nearly all archaeal sequences (including the predicted yeast DPH6 (YLR143w) and P. furiosus PF0828), whereas the second encompasses all the bacterial sequences as well as the second copy found in a few archaeal genomes (Figure F4 4 and Additional file 3: Figure S3). table Table 1 Taxonomic distribution of DUF71 homologs in archaeal and bacterial genomes tgroup align left cols 6 colspec colname c1 colnum 1 colwidth 1* c2 2 c3 3 c4 4 c5 5 c6 thead valign top row rowsep entry Phylum Nb (%) genomes Phylum Nb (%) genomes Phylum Nb (%) genomes tfoot The number of genomes per phylum containing at least one homolog of DUF71 is indicated. tbody Archaea Crenarchaeota 37/37 (100%) Korarchaeota 0/1 (0%) Thaumarchaeota 2/2 (100%) Euryarchaeota 79/79 (100%) Bacteria Acidobacteria 3/7 (42.9%) Dictyoglomi 0/2 (0%) Proteobacteria_Epsilon 0/64 (0%) Actinobacteria 1/206 (0.5%) Elusimicrobia 0/2 (0%) Proteobacteria_Gamma 27/406 (6.7%) Aquificae 0/10 (0%) Fibrobacteres 0/2 (0%) PVC_Chlamydiae 1/73 (1.4%) Bacteroidetes 20/73 (27.4%) Firmicutes 20/484 (4.1%) PVC_Planctomycetes 3/6 (50%) Caldiserica 0/1 (0%) Fusobacteria 0/5 (0%) PVC_Verrucomicrobia 0/4 (0%) Chlorobi 0/11 (0%) Gemmatimonadetes 0/1 (0%) Spirochaetes 1/45 (2.2%) Chloroflexi 5/16 (31.3%) Ignavibacteria 0/1 (0%) Synergistetes 0/4 (0%) Chrysiogenetes 0/1 (0%) Nitrospirae 1/3 (33.3%) Thermodesulfobacteria 0/2 (0%) Cyanobacteria 0/45 (0%) Proteobacteria_Alpha 2/204 (1%) Thermotogae 5/14 (35.7%) Deferribacteres 0/4 (0%) Proteobacteria_Beta 8/119 (6.7%) Deinococcus-Thermus 2/17 (11.8%) Proteobacteria_Delta 1/48 (2.1%) Figure 4Neighbor-joining phylogenetic tree of the 445 DUF71 homologs identified in public databases. Neighbor-joining phylogenetic tree of the 445 DUF71 homologs identified in public databases. The scale bar represents the average number of substitutions per site. Numbers at nodes are bootstrap values. For clarity only values greater than 50% are indicated. Colors correspond to the taxonomic affiliation of sequences (see the box on the figure for details). The full tree of Cluster 1 is shown in Additional file 3: Figure S3). 1745-6150-7-32-4 This second group emerged from within the archaeal sequences of the first cluster and showed various contradictions with the currently recognized taxonomy because bacterial sequences from distantly related lineages appeared intermixed in the tree (Figure 4). These observations together with the extremely patchy distribution of DUF71 in bacteria strongly supports the hypothesis that the bacterial DUF71 was of archaeal origin and spread through this domain mainly by HGT. Interestingly, the second homologs present in a few archaeal genomes emerged from bacterial sequences, suggesting that secondary HGT occurred from Bacteria to Archaea allowing them acquiring a second DUF71 homolog.In contrast, a phylogenetic analysis focused on archaeal and eukaryotic sequences strongly supported the separation between these two Domains (posterior probabilities (PP) = 1). Moreover it recovered the monophyly of most eukaryotic and archaeal major lineages (most PP > 0.95, Additional file 3: Figure S3), suggesting that DUF71 was present in their ancestors. However, as expected given the small number of amino acid positions analyzed (182 positions), the relationships among these lineages were mainly unresolved (most PP < 0.95) precluding the in-depth analysis of the ancient evolutionary history of DUF71 in Archaea and Eucarya (Additional file 3: Figure S3). Nevertheless, the wide distribution of DUF71 in these two Domains (even in highly derived parasites such as Microsporidia, Cryptosporidium, Entamoeba or Nanoarchaeum equitans, not shown) and its ancestral presence in most of their orders/phyla suggested that this gene was present in the last common ancestor of these two Domains. This inference does not imply, however, that no HGT occurred in these Domains. Indeed, some incongruence between the DUF71 phylogeny and the reference phylogeny of organisms B39 39 suggested putative cases of HGT. For instance, it was observed for the Thermofilum pendens DUF71 that robustly groups with Methanomicrobia (Euryarchaeota) and not with other Thermoproteales (Additional file 3: Figure S3).Because diphthamide is a modification specific to the archaeal and eukaryotic EF-2 proteins and bacteria lack all known diphthamide biosynthesis genes, we propose that cluster 1 in our phylogeny corresponds to bona fide Dph6 enzymes involved in diphthamide synthesis (Figure 4). This function therefore very likely represents the ancestral function of the whole DUF71 family. In contrast, bacteria do not synthesize diphthamide, suggesting that the bacterial DUF71 homologs and the few additional archaeal copies (cluster 2, Figure 4) are involved in another function, and thus a functional shift occurred after the HGT of an archaeal bona fide Dph6 to bacteria. Notably, these genes (including PF0295, the second DUF71 copy found in P. furiosus) are strongly clustered on the chromosome with vitamin B12 salvage genes. More precisely 75/102 are adjacent to vitamin B12 transporter genes (such as the BtuCDF genes) B40 40 and 18/102 are adjacent to cbiB genes encoding adenosylcobinamide-phosphate synthetase, an enzyme shared by the de novo and salvage pathways B41 41 (Figure F5 5A). This clustering data can be visualized in the “Duf71-B12” subsystem in the SEED database, and two typical clusters are shown in Figure 5B. On this basis, we hypothesize that the archaeal and bacterial DUF71 genes that cluster with B12 vitamin genes have a role in B12 metabolism. Figure 5Links between the DUF71 family and B12 salvage (A) Summary of cobinamide derivative salvage in Bacteria and Archaea; arrows with dotted lines denote multiple steps. Links between the DUF71 family and B12 salvage. (A) Summary of cobinamide derivative salvage in Bacteria and Archaea; arrows with dotted lines denote multiple steps. (B) Typical examples of physical clustering of DUF71-B12 genes with B12 salvage genes in Archaea and Bacteria. Abbreviations: Pseudo-B12, adenosylpseudocobalamin; Cbi, Cobinamide; AdoCbi, adenosylCbi; AdoCbi-P, AdenosylCbi-phosphate; AdoCby, adenosylcobyric acid; AP; (R)-1-amino-2-propanol; AP-P, AP-phosphate; Thr-P, L-threonine-phosphate; DMB, 5,6-dimethylbenzimidazole; α-AMP-AP, α-adenylate-AP; CobU, ATP:AdoCbi kinase, GTP:AdoCbi-GDP guanylyltransferase; CobY, NTP:AdoCbi-P nucleotidyltransferase; CobA, ATP:co(I)rrinoid adenosyltransferase; aCbiZ, adenosylcobinamide amidohydrolase; bCbiZ, pseudo-B12 amidohydrolase; CbiB, cobyric acid synthetase; CobD, L-threonine phosphate decarboxylase; CobS, cobalamin (5-P) synthase; CobT, 5,6-dimethylbenzimidazole phosphoribosyltransferase; CobC or CobZ, alpha-ribazole-5′-phosphate phosphatase; cobY, adenosylcobinamide-phosphate guanylyltransferase; CbiP, cobyric acid synthase; BtuFCD, cobamide transporter subunits. 1745-6150-7-32-5 Finally, some bacterial DUF71 proteins might also have other functions because a set of bacteria such as Clostridium perfringens have two or more DUF71 homologs (Figure 4 and Additional file 1: Table S1). The most extreme example is Dehalococcoides sp. CBDB1, which encodes five DUF71 homologs in its genome. In the case of C. perfringens ATCC 13124 and SM101, one homolog (YP_695745 and YP_698440) clusters both physically and phylogenetically (Figure 4 and 5A) with the B12 subgroup proteins, whereas the second homolog (YP_695178 and YP_698039) is related to Acinetobacter baumanii (Cluster 3, Figure 4) and is not found associated to gene clusters related to B12 salvage (data not shown).Therefore, based on phylogenetic and physical clustering the DUF71 proteins were split into: the Dph6 and the DUF71-B12 subgroups that were annotated as such and captured in the “Diphthamide biosynthesis” and “Duf71-B12” subsystems in the SEED database. Predicting the function of members of the DUF71-B12 subgroup As members of the DUF71-B12 subgroup clustered strongly with B12 transport genes and with cbiB (Figure 5B), we focused on the early steps on B12 salvage, which are quite diverse because several forms of cobamides [cobalamin-like or Cbl-like compounds] can be salvaged (Figure 5A). Cobinamide (Cbi) is adenylated after transport to form adenosylcobinamide (AdoCbi). In most bacteria, AdoCbi is directly phosphorylated by CobU before being transformed after several steps into adenosylcobalamin (AdoCbl or coenzyme B12), in which the lower ligand is 5,6-dimethylbenzimidazole (DMB) (see B42 42 for review) (Figure 5A). Archaea use a different salvage route in which AdoCbi is converted to adenosylcobyric acid (AdoCby), an intermediate of the de novo pathway, by an amidohydrolase, aCbiZ B43 43 (Figure 5A). AdoCby is then converted directly to adenosylcobinamide-phosphate (AdoCbi-P) by CbiB. Finally some bacteria have CbiZ homologs (bCbiZ) that hydrolyze adenosylpseudocobalamin (Ado-Pseudo-B12) B44 44 , which contains an adenine instead of DMB as its lower ligand (Figure 1C and 5A).In order to gain insight into the possible function of DUF71-B12 family members, we analyzed the co-distribution pattern of CbiZ, CbiB and DUF71-B12 proteins in Archaea and Bacteria. Interestingly, to a few exceptions, all prokaryotic genomes encoding CbiB harbor either CbiZ or DUF71-B12 (Figure F6 6). However, in bacteria, there was strict anti-correlation between the DUF71-B12 and the CbiZ families (Figure 6A). This was not the case in Archaea where quite a few organisms (such as P. furiosis or Methanosarcina mazei Go1) harbored both families (Figure 6B). This distribution profile suggests that members of the DUF71-B12 subfamily fulfil the same roles as the bacterial CbiZ enzymes (bCbiZ), either by catalysing the same reaction (cleaving Ado-pseudo-B12 into AdoCby) or by providing another route to salvaging Pseudo-B12. This hypothesis would explain why bacteria would have one or the other while Archaea could carry both (Figure 6B), because archaeal CbiZ proteins have been predicted to lack pseudo-B12 cleavage activity 44 . Figure 6Distribution of DUF71-B12, CbiZ and CbiB in bacterial (A) and archaeal genomes (B). Distribution of DUF71-B12, CbiZ and CbiB in bacterial (A) and archaeal genomes (B). The trees are species tree constructed in iTol (itol.embl.de/), the presence and absence of the specific genes was derived from the “DUF71-B12” subsystem in the SEED database. 1745-6150-7-32-6 Detailed analysis of the signature motifs of the two subfamilies reveal that the strictly conserved EGGE/DXE188 motif (P. furiosus PF0828 numbering) in Dph6 proteins is replaced by a ENGEF/YH188 motif in the DUF71-B12 proteins (Additional file 3: Figure S2 and Additional file 3: Figure S4). In the Dph6 family, E188 is located near the predicted diphthine binding site and is predicted to be involved in catalysis (Figure 3B). The replacement of the strictly conserved E188 residue by a Histidine residue strongly suggest a change in the reaction catalyzed by the DUF71-B12 subfamily compared to the Dph6 family. The structure based comparison between the two subfamilies also strongly supports the hypothesis that their substrates are different, because all residues predicted to be involved in EF-2 binding (Figure 3B see section above) are different in the DUF71-B12 subfamily but mostly conserved within this subfamily (Additional file 3: Figure S2 and residues in green in Figure 3C). Residues that are conserved between the two DUF71 subfamilies (Additional file 3: Figure S2 and residues in blue in Figure 3C) are found around the phosphate groups of ATP, including S12, G13, G14, K15, D16, H48, and T189 (PF0828 sequence numbering) or belong to the C-terminal conserved sequence motif (EGGE/D-X-E188) such as G182, G184, G185, E186, F187 (Additional file 3: Figure S2 and Figure 3C). Further experimental studies will be required to determine whether DUF71-B12 proteins are Ado-pseudo-B12 amidohydrolases or have another role in Ado-pseudo-B12 salvage. Conclusions Our detailed analyses of the DUF71 family members presented here provide an example of the power of comparative genomic approaches for solving important “missing genes” or “missing function” cases. These analyses simultaneously illustrate the difficulties inherent in accurately annotating gene families. On one hand, the evidence identifying a candidate for the missing Dph6 gene family derived from genomic evidence (mainly phylogenetic distribution and gene fusions) and post-genomic evidence (structure, co-expression analysis and genome-wide phenotype experiments) is so strong that it could be used as an example where the functional annotation of a protein of unknown function could be derived from comparative genomic alone. On the other hand our analyses show that a subgroup of the DUF71 family is most certainly involved in a metabolic pathway unrelated to diphthamide synthesis and that transferring functional annotations from homology scores alone would be inappropriate in this case. We believe that this integrated functional annotation approach will play an important role in future pipelines for annotation of protein families. Competing interests The author(s) declare that they have no competing interests. Authors’ contributions VdC-L conducted the comparative genomic analysis and made the functional predictions. CB-A performed the phylogenetic analysis. FF, LT and JFH did the structural analysis. All authors participated in writing/reviewing the manuscript. All authors read and approved the final manuscript. Reviewers’ comments Reviewer number 1: Arcady Mushegian Stowers Institute for Medical Research, 1000 E 50 sup th Street, Kansas City, Missouri 64110 The study by de Crecy-Lagard and co-authors pinpoints the DUF71/COG2102 asthe most likely archaeal/eukaryotic ATP-dependent diphthine-ammonia ligase,the so far unaccounted-for enzyme in the pathway of diphtamide biosynthesis, which pathway is responsible for the formation of unique derivative of the conserved histidine within the translation elongation factor 2. A distinct subfamily of this protein family appears to play another role in bacteria and a subset of archaea, most likely in the salvage of an intermediate of cobalamine biosynthesis. The evidence presented in the paper consists of genome context information, sequence-structure prediction and the data from yeast concerning gene expression and chemical-genomics profiling. Taken together, the evidence seems compelling to me. The data from yeast represent partial functional validation of predictions made for prokaryotes. I would recommend only to tone down the suggestion that all this is a “novel paradigm” in analysis of gene function: researchers have been inferring gene functions from phenotypes, as well as from directly detected changes in genotype, for a long, long time, and the current study is a logical extension of these approaches. What is different in the last 15 years is that we can compare these properties across many species with completely sequenced genomes; but even this is a logical extension of the previous work (compare, for example, with work from Yanofskyand Jensen labs on biosynthesis of aromatic amino acids) it was not any prescription of a previous scientific paradigm that constrained the work, but rather the lack of the data.Response: The references to a “novel paragdim” were eliminated in the abstract and the introduction as suggested. Reviewer number 2: Michael Galperin NCBI, NLM, NIH Computational Biology Branch, 8600 Rockville Pike MSC 6075, Building 38A, Room 6N601, Bethesda, MD 20894-6075 The paper by de Crecy-Lagard and colleagues is a fine example of using comparative genomics to patch the remaining holes in the metabolic pathways. The key conclusion of this work, prediction of the participation of the members of the DUF71/COG2102 family in diphtamide biosynthesis in archaea and eukaryotes and in B12 metabolism in some bacteria and archaea, is extremely convincing and hardly even needs an experimental verification. The second conclusion, that ammonia used in the diphthine ammonia lyase-catalyzed reaction in different organisms could use generated by two different enzymes, asparagine synthetase and the RidA domain, also sounds convincing. However, proving beyond reasonable doubt that DUF71/COG2102 family members with their ATP-pyrophosphatase activity comprise the key part of diphthine ammonia lyase does not prove that they are the only subunits of this enzyme. Even if the proposed reaction scheme (Figure 1B) is correct, there still might be a need for a ligase subunit that couple removal of the AMP moiety from EF2 with its amidation. There is a definite possibility that DUF71/COG2102 family members catalyze all these individual reactions, e.g. using its unique C-terminal 100-aa domain, but that would have to be proven experimentally. The reported involvement of the likely scaffold protein YBR246w (DPH7) appears to support the idea that diphthine ammonia lyase consists of more than one type of subunits. Otherwise, it is a great paper that vividly demonstrates the power of comparative-genomics approaches.We added a phrase stating that “even if one cannot rule out at this stage that other catalytic subunits yet to be identified may also be required”. Reviewer number 3: L. Aravind NCBI, NLM, NIH Computational Biology Branch, 8600 Rockville Pike MSC 6075, Building 38A, Room 6N601, Bethesda, MD 20894-6075 This work uses contextual information to identify the diphthine-ammonia ligase in archaea and eukaryotes. It also shows that the yeast protein YBR246W is indeed not the correct ligase, but rather the MJ0570-like PP-loop ATPases. The authors also show that this family has been transferred to certain bacteria where they infer that it is likely to have undergone a functional shift to participate in B12 salvage. They cautiously propose that it might function as a replacement for CbiZ to function as an amidohydrolase (the reverse of the typical PP-loop ATPase reaction) as against a ligase. The conclusions are definitive and the article makes a useful contribution to the understanding of protein modification and cofactor biosynthesis. This said, there are certain issues with the current form of the article that authors necessarily need to address in their revision: 1) (pg 8) The authors state that the MJ0570-like enzymes have a HUP domain followed by a distinct C-terminal domain. They do not explain the meaning of this properly nor cite the reference of the paper (PMID: 12012333) pertaining to the HUP domains where this family was identified as a PP-loop ATPase, along with the observations (Table 1 in that reference) that it has a primarilyarchaeo-eukaryotic phyletic pattern, and that eukaryotic versions might be fused to two C-terminal domains of the YabJ-like chorismate lyase fold (now termed RidA). It should be stated that the N-terminus is a PP-loop ATPase domain of the HUP class of Rossmannoid domains not all HUP domains are ligases only the PP-loop and the HIGH nucleotidyltransferases This clarifies that it is related to other ATP-utilizing amidoligases such as NAD synthethase, GMP synthetase and asparagine synthetase. This would place their inferred amidoligase activity in the context of comparable, known amidoligase activities of related enzymes. In fact it would be advisable to place the fact that these are PP-loop enzymes in the abstract itself.The following sentence was added: “This family had previously been previously described as a PP-loop ATPase of unknown function containing a Rossmannoid class HUP domain (Aravind et al. 2002).” A reference to the PP-loop ATPase family was added in the abstract as requested. A reference to the same work was added when talking about the RidA fusion. For the phylogenetic distribution the results presented here are a bit different from the previous study because many more genome are available after 10 years and we show that the family is also bacterial.2) The authors persistently refer to the domain as DUF71. This name is no longer current in Pfam and it has long been recognized as mentioned in the reference noted above that these proteins are not “domains of unknown function” but PP-loop ATPases. The domain is correctly termed ATP_bind_4 (PF01902) in Pfam. This Pfam (not the misleading DUF71) name and Pfam number should be indicated with just a statement in the introduction that it was formerly DUF71.This domain is currently called “Domain of unknown function DUF71, ATP-binding domain” in the InterPro database (IPR002761) even if it is called ATP_bind_4 (PF01902) in Pfam. It is much shorter to use (as well as easier for the reader to follow) the DUF71 abbreviation rather than the ATP_bind_4 abbreviation. We therefore prefer to keep DUF71. We however introduced a statement giving the different names of this domain in the InterPro, Pfam and COG databases at the end of the introduction. 3) The authors apparently have a misapprehension regarding the Methanohalophilus mahii protein both in the text and the domain architecture rendered in the figure. First, these proteins have two N-terminal domains fused tothe MJ0570-like module: namely a N-terminal class-II glutamineamidotransferase (GAT-II, e.g. see PMID: 20023723) and second PP-loop ATPase domain thereafter (i.e. one related to asparagine synthetase). This GAT domain as in the case of other PP-loop enzymes could supply ammonia by cleaving it off glutamine. But this does not explain which PP-loop domain utilizes it. In the case of the Asn-synthetase it is used by the cognate PP-loop domain. In this case the presence of two PP-loop domains suggests that it is either utilized by both for different reactions or else the second domain does not receive the NH3 from this GAT. This also leads to the question what reaction is the Asn synthetase like PP-loop domain catalyzing? Quality of written English: AcceptableThe source of the confusion came from the fact that the Asn Synthase domain (AsnB) contains two domains the GAT-II domain and the Asn_Synthase_B_C PP-Loop ATPase domain. Both the figure and the text were modified to avoid the confusion. Based on the reviewer’s comments the sentence discussing the potential role of the AsnB domain was modified as follows: “This domain organization strongly suggests that in this subset of enzymes, the hydrolysis of glutamine catalyzed by the fused GAT-II domain could provide the NH3 moiety to both the DUF71 and the Asn_Synthase_B_C enzymes.”4) Based on phyletic complementarity the authors suggest that bacterial CbiZ might be displaced by the bacterial MJ0570-like enzymes. This seems unusual Why utilize a PP-loop ATPase for the reverse reaction, i.e. amidohydrolase? Typically there is little overlap between the families involved in amidohydrolase as opposed to ATP-dependent ligase activity. Of the almost 12 distinct major inventions of amidoligase activity, hardly any representatives of these superfamilies have been reused as amidohydrolases. So do the authors note anything special in the case of the bacterial representatives that might support such a functional shift?This hypothesis is derived from phylogenetic distribution and it is not unprecedented that ligases and hydrolases are found in the same family (see example in PMID:12359880). However, we agree that this hypothesis derives mainly from phylogenetic patterns analysisand beyond the differences in the predicted substrate binding pocket found in the DUF71-B12 family we did not identify specify changes that could point to a shift to hydrolase, hence our caution in our prediction as stated in the text.Quality of written English: Acceptable bm ack Acknowledgements This work was supported by the US National Science Foundation (grant MCB-1153413 to V. dC-L), the US National Institutes of Health (grant U54GM094597 to G.T. Montelione and the Northeast Structural Genomics Consortium) and the Agence Nationale pour la Recherche (grant ANR-10-BINF-01-0127 Ancestrome) to C. B-A. We thank Raffael Schaffrath and Mike Stark for sharing for sharing unpublished diphthamide related data and critical evaluation of manuscript parts. We thank for Jorge Escalante-Semerena for sharing his immense knowledge on B12 salvage pathways, Diana Downs for disclosing unpublished results on RidA function, Manal Swairjo for chemical insight, and Andrew Hanson for helpful input on the manuscript. refgrp Unique modifications of translation elongation factorsGreganovaEAltmannMBütikoferPFEBS J2011278152613lpage 2624ADP-ribosylation of elongation factor 2 by diphtheria toxin. Isolation and properties of the novel ribosyl-amino acid and its hydrolysis productsVan NessBGHowardJBBodleyJWJ Biol Chem1980255221071710720ADP-ribosylation of elongation factor 2 by diphtheria toxin. NMR spectra and proposed structures of ribosyl-diphthamide and its hydrolysis productsVan NessBGHowardJBBodleyJWJ Biol Chem1980255221071010716Diphthamide biosynthesis requires an organic radical generated by an iron–sulphur enzymeZhangYZhuXTorelliATLeeMDzikovskiBKoralewskiRMWangEFreedJKrebsCEalickSEetal Nature20104657300891896Mechanistic understanding of Pyrococcus horikoshii Dph2, a [4Fe-4S] enzyme required for diphthamide biosynthesisZhuXDzikovskiBSuXTorelliATZhangYEalickSEFreedJHLinHMol Biosyst2011717481PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular datingLartillotNLepageTBlanquartSBioinformatics2009251722862288XtalView/Xfit—A versatile program for manipulating atomic coordinates and electron densityMcReeDEJ Struct Biol19991252–3156165MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transformKatohKMisawaKKumaK-IMiyataTNucleic Acids Res2002301430593066Diphthamide modification of eEF2 requires a J-domain protein and is essential for normal developmentWebbTRCrossSHMcKieLEdgarRVizorLHarrisonJPetersJJacksonIJJ Cell Sci20081211931403145Reconstitution of diphthine synthase activity in vitroZhuXKimJSuXLinHBiochemistry2010494496499657DPH5, a methyltransferase gene required for diphthamide biosynthesis in Saccharomyces cerevisiaeMattheakisLCShenWHCollierRJMol Cell Biol199212940264037In vitro biosynthesis of diphthamide, studied with mutant Chinese hamster ovary cells resistant to diphtheria toxinMoehringTJDanleyDEMoehringJMMol Cell Biol198444642650YBR246W is required for the third step of diphthamide biosynthesisSuXChenWLeeWJiangHZhangSLinHJ Am Chem Soc20111342773776InterPro in 2011: new developments in the family and domain prediction databaseHunterSJonesPMitchellAApweilerRAttwoodTKBatemanABernardTBinnsDBorkPBurgeSNucleic Acids Res201240D1D306D312The Pfam protein families databaseFinnRDMistryJTateJCoggillPHegerAPollingtonJEGavinOLGunasekaranPCericGForslundKNucleic Acids Res201038suppl_1D211D222The COG database: an updated version includes eukaryotesTatusovRFedorovaNJacksonJJacobsAKiryutinBKooninEKrylovDMazumderRMekhedovSNikolskayaABMC Bioinforma20034141Gapped BLAST and PSI-BLAST: a new generation of protein database search programsAltschulSFMaddenTLSchafferAAZhangJZhangZMillerWLipmanDJNucleic Acids Res1997251733893402Clustal W and Clustal X version 2.0LarkinMABlackshieldsGBrownNPChennaRMcGettiganPAMcWilliamHValentinFWallaceIMWilmALopezRBioinformatics2007232129472948Multiple sequence alignment with hierarchical clusteringCorpetFNucleic Acids Res198816221088110890The subsystems approach to genome annotation and its use in the project to annotate 1000 genomesOverbeekRBegleyTButlerRMChoudhuriJVChuangHYCohoonMde Crécy-LagardVDiazNDiszTEdwardsRNucleic Acids Res2005331756915702The integrated microbial genomes system: an expanding comparative analysis resourceMarkowitzVMChenI-MAPalaniappanKChuKSzetoEGrechkinYRatnerAAndersonILykidisAMavromatisKNucleic Acids Res201038suppl 1D382D390The MicrobesOnline web site for comparative genomicsAlmEJHuangKHPriceMNKocheRPKellerKDubchakILArkinAPGenome Res200515710151022Exploring the functional landscape of gene expression: directed search of large microarray compendiaHibbsMAHessDCMyersCLHuttenhowerCLiKTroyanskayaOGBioinformatics2007232026922699Saccharomyces genome database: the genomics resource of budding yeastCherryJMHongELAmundsenCBalakrishnanRBinkleyGChanETChristieKRCostanzoMCDwightSSEngelSRNucleic Acids Res201240D1D700D705Systematic analysis of genome-wide fitness data in yeast reveals novel gene function and drug actionHillenmeyerMEricsonEDavisRNislowCKollerDGiaeverGGenome Biol2010113R30The chemical genomic portrait of yeast: uncovering a phenotype for all genesHillenmeyerMEFungEWildenhainJPierceSEHoonSLeeWProctorMSt.OngeRPTyersMKollerDScience20083205874362365Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotationLetunicIBorkPBioinformatics2007231127128WebLogo: a sequence logo generatorCrooksGEHonGChandoniaJ-MBrennerSEGenome Res200414611881190NCBI Reference sequences (RefSeq): current status, new features and genome annotation policyPruittKDTatusovaTBrownGRMaglottDRNucleic Acids Res201240D1D130D135MUST, a computer package of management utilities for sequences and treesPhilippeHNucleic Acids Res1993212252645272SeaView Version 4: a multiplatform graphical user interface for sequence alignment and phylogenetic tree buildingGouyMGuindonSGascuelOMol Biol Evol2010272221224Monophyly of class I aminoacyl tRNA synthetase, USPA, ETFP, photolyase, and PP-ATPase nucleotide-binding domains: implications for protein evolution in the RNA worldAravindLAnantharamanVKooninEVProteins: Structure, Funct Bioinform2002481114A large conformational change in the putative ATP pyrophosphatase PF0828 induced by ATP bindingForouharFSaadatNHussainMSeetharamanJLeeIJanjuaHXiaoRShastryRActonTBMontelioneGTActa Crystallogr Sect F Struct Biol Cryst Commun2011671113231327A chemical genomic screen in Saccharomyces cerevisiae reveals a role for diphthamidation of translation Elongation Factor 2 in inhibition of protein synthesis by SordarinBotetJRodriguez-MateosMBallestaJPGRevueltaJLRemachaMAntimicrob Agents Chemother200852516231629A versatile partner of eukaryotic protein complexes that is involved in multiple biological processes: Kti11/Dph3BärCZabelRLiuSStarkMJRSchaffrathRMol Microbiol200869512211233The nature and character of the transition state for the ADP-ribosyltransferase reactionJorgensenRWangYVisschedykDMerrillAREMBO Rep200898802809Amidoligases with ATP-grasp, glutamine synthetase-like and acetyltransferase-like domains: synthesis of novel metabolites and peptide modifications of proteinsIyerLMAbhimanSMaxwell BurroughsAAravindLMol BioSys200951216361660Conserved YjgF protein family deaminates reactive enamine/imine intermediates of Pyridoxal 5′-Phosphate (PLP)-dependent enzyme reactionsLambrechtJAFlynnJMDownsDMJ Biol Chem2012287534543461Phylogeny and evolution of the Archaea: one hundred genomes laterBrochier-ArmanetCForterrePGribaldoSCurr Opin Microbiol2011143274281In vitro functional characterization of BtuCD-F, the Escherichia coli ABC transporter for vitamin B12 uptakeBorthsELPoolmanBHvorupRNLocherKPReesDCBiochemistry200544491630116309The CbiB protein of Salmonella enterica is an integral membrane protein involved in the last step of the de novo corrin ring biosynthetic pathwayZayasCLClaasKEscalante-SemerenaJCJ Bacteriol20071892176977708Conversion of cobinamide into adenosylcobamide in bacteria and archaeaEscalante-SemerenaJCJ Bacteriol20071891345554560CbiZ, an amidohydrolase enzyme required for salvaging the coenzyme B12 precursor cobinamide in archaeaWoodsonJDEscalante-SemerenaJCProc Natl Acad Sci USA20041011035913596The cobinamide amidohydrolase (cobyric acid-forming) CbiZ enzyme: a critical activity of the cobamide remodelling system of Rhodobacter sphaeroidesGrayMJEscalante-SemerenaJCMol Microbiol200974511981210 xml version 1.0 encoding UTF-8 REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd INGEST IEID E1MF9V84T_MCIMUU INGEST_TIME 2013-03-05T20:12:30Z PACKAGE AA00013476_00001 AGREEMENT_INFO ACCOUNT UF PROJECT UFDC FILES PAGE 1 RESEARCHOpenAccessComparativegenomicanalysisoftheDUF71/ COG2102familypredictsrolesindiphthamide biosynthesisandB12salvageValriedeCrcy-Lagard1*,FarhadForouhar2,ClineBrochier-Armanet3,LiangTong2andJohnFHunt2AbstractBackground: Theavailabilityofover3000publishedgenomesequenceshasenabledtheuseofcomparative genomicapproachestodrivethebiologicalfunctiondiscoveryprocess.Classically,oneusedtolinkgenewith functionbygeneticorbiochemicalapproaches,alengthyprocessthatoftentookyears.Phylogeneticdistribution profiles,physicalclustering,genefusion,co-expressionprofiles,structuralinformationandothergenomicor post-genomicderivedassociationscanbenowusedtomakeverystrongfunctionalhypotheses.Here,weillustrate thisshiftwiththeanalysisoftheDUF71/COG2102family,asubgroupofthePP-loopATPasefamily. Results: TheDUF71familycontainsatleasttwosubfamilies,oneofwhichwaspredictedtobethemissing diphthine-ammonialigase(EC6.3.1.14),Dph6.ThisenzymecatalyzesthelastATP-dependentstepinthesynthesis ofdiphthamide,acomplexmodificationofElongationFactor2thatcanbeADP-ribosylatedbybacterialtoxins. Dph6orthologsarefoundinnearlyallsequencedArchaeaandEucarya,asexpectedfromthedistributionofthe diphthamidemodification.TheDUF71familyappearstohaveoriginatedintheArchaea/Eucaryaancestorandto havebeensubsequentlyhorizontallytransferredtoBacteria.BacterialDUF71memberslikelyacquiredadifferent functionbecausethediphthamidemodificationisabsentinthisDomainofLife.In-depthinvestigationssuggest thatsomearchaealandbacterialDUF71proteinsparticipateinB12salvage. Conclusions: ThisdetailedanalysisoftheDUF71familymembersprovidesanexampleofthepowerofintegrated data-mimingforsolvingimportant missinggenes or missingfunction casesandillustratesthedangerof functionalannotationofproteinfamiliesbyhomologyalone. Reviewers names: ThisarticlewasreviewedbyArcadyMushegian,MichaelGalperinandL.Aravind. Keywords: Diphthamide,VitaminB12,Amidotransferase,ComparativegenomicsBackgroundInbothArchaeaandEucarya,thetranslationElongation Factor2(EF-2)harborsacomplexpost-translational modificationofastrictlyconservedhistidine(His699in yeast)calleddiphthamide[1].Thismodificationisthe targetofthediphtheriatoxinandthe Pseudomonas exotoxinA,whichinactivateEF-2byADP-ribosylationof thediphthamide[2,3].Althoughthediphthamidebiosynthesispathwaywasdescribedintheearly1980 s [2,3],thecorrespondingenzymeshaveonlyrecently beencharacterized. Invitro reconstitutionexperiments haveshownthatthefirststep,thetransferofa3-amino3-carboxypropyl(ACP)groupfrom S -adenosylmethionine(SAM)totheC-2positionoftheimidazoleringof thetargethistidineresidue,iscatalyzedinArchaeaby theiron-sulfur-clusterenzyme,Dph2[4,5](Figure1A). Geneticandcomplementationstudieshaveshownthat thecatalysisofthesamefirststeprequiresfourproteins (Dph1-Dph4)inyeastandothereukaryotes[6-9].The subsequentstep,trimethylationofanaminogroupto formthediphthineintermediate,iscatalyzedby diphthinesynthase,Dph5(EC2.1.1.98)(Figure1A) [10,11].Thelaststep,theATP-dependentamidationof thecarboxylategroup[12],iscatalyzedbydiphthine*Correspondence: vcrecy@ufl.edu1DepartmentofMicrobiologyandCellScience,UniversityofFlorida, Gainesville,FL32611,USA Fulllistofauthorinformationisavailableattheendofthearticle 2012deCrcy-Lagardetal.;licenseeBioMedCentralLtd.ThisisanOpenAccessarticledistributedunderthetermsofthe CreativeCommonsAttributionLicense(http://creativecommons.org/licenses/by/2.0),whichpermitsunrestricteduse, distribution,andreproductioninanymedium,providedtheoriginalworkisproperlycited.deCrcy-Lagard etal.BiologyDirect 2012, 7 :32 http://www.biology-direct.com/content/7/1/32 PAGE 2 ammonialigase(EC6.3.1.14),butthecorresponding genehasnotbeenidentified(http://www.orenza.u-psud. fr/).Aproteininvolvedinthislaststepwasrecently identifiedinyeast(YBR246WorDph7),butitismost certainlynotdirectlyinvolvedincatalysisasitisnot conservedinArchaeaanditcontainsaWD-domain likelytobeinvolvedinprotein/proteininteractions[13]. Usingacombinationofcomparativegenomic approaches,wesetouttoidentifyacandidategenefor thisorphanenzymefamily.Basedontaxonomic distribution,domainorganizationofgenefusions,physicalclusteringonchromosomes,atomicstructuraldata, co-expression,andphenotypedata,apromisingcandidatewasidentified,thefamilycalledDomainofUnknownFunctionfamilyDUF71(IPR002761)inInterpro [14].ThisfamilyisalsocalledATP_bind_4(PF01902)in Pfam[15]orPredictedATPasesofPP-loopsuperfamily (COG2102)intheClusterofOrtholousGroupdatabase [16].However,detailedanalysisoftheDUF71family revealedthatthisfamilyisalmostsurelynot Dph2 (+Dph1, Dph3, Dph4)EC 2.1.1.98 8Dph5EC 6.3.1.14SAM ATP, NH3 Dph6? (+Dph7) SA M Diphthine DiphthamideA BAdo-PseudoB12 R=deoxyadenosyl bCbiZ CDiphthine Diphthamide Dph6? Dph6? Figure1 StructuresofdiphthamideandB12precursorsandderivatives. ( A )Thecorediphthamidepathwayispredictedtocontainthree enzymesDph2,Dph5andDph6inArchaea.Theformationofdiphthinehasbeenreconstituted invitro usingDph2andDph5from Pyrococcus horikoshii [4,5].TheenzymefamilycatalyzingthelaststepinArchaeaandEukaryaDph6wasmissing.Inyeast,thefirstandlaststepsrequire additionalproteins(Dph1,Dph3andDph7).( B )PredictedDph6-catalyzedreactions.( C )Ado-Pseudo-B12structureandhydrolysissitebythe bacterialCbiZenzyme(bCbiZ).Parts( A )and( B )areadaptedwithpermissionfromXulingZhu;JungwooKim;XiaoyangSu;HeningLin; Biochemistry 2010,49,9649 9657.Copyright2010AmericanChemicalSociety. deCrcy-Lagard etal.BiologyDirect 2012, 7 :32 Page2of13 http://www.biology-direct.com/content/7/1/32 PAGE 3 isofunctional.SomeArchaeacontaintwoverydivergent copiesofthegene,whilehomologsarefoundinBacteria,whichareknowntolackdiphthamide.ThisobservationsuggeststhatsomeDUF71membershave differentfunctionsandprobablyparticipateindifferent biochemicalpathways.MethodsComparativegenomicsTheBLASTtools[17]andresourcesatNCBI(http:// www.ncbi.nlm.nih.gov/)wereroutinelyused.MultiplesequencealignmentswerebuiltusingClustalW[18]orMultialin[19].Proteindomainanalysiswasperformedusing thePfamdatabasetools(http://pfam.janelia.org/)[15]. Analysisofthephylogeneticdistributionandphysical clusteringwasperformedintheSEEDdatabase[20]. Resultsareavailableinthe Diphthamidebiosynthesis and DUF71-B12 subsystemonthepublicSEEDserver (http://pubseed.theseed.org/SubsysEditor.cgi).PhylogeneticprofilesearcheswereperformedontheIMGplatform [21]usingthephylogeneticquerytool(http://img.jgi.doe. gov/cgi-bin/w/main.cgi?section=PhylogenProfiler&page=phyloProfileForm).Physicalclusteringwasanalyzedwith theSEEDsubsystemcoloringtoolortheSeedviewer Compareregiontool[20]aswellasontheMicrobesOnline(http://www.microbesonline.org/)treebasedgenome browser[22].TheSPELLmicroarrayanalysisresource [23]wasusedthroughthe Saccharomyces GenomeDatabase(SGD)(http://www.yeastgenome.org/)[24]toanalyze yeastgenecoexpressionprofiles.Clusteringofyeastdeletionmutantsbasedonphenotypeanalysiswasanalyzed throughtheyeastfitnessdatabaseavailableathttp://fitdb. stanford.edu/[25,26].Mappingofgenedistributionprofile totaxonomictreesweregeneratedusingtheiTOLsuite (http://itol.embl.de/index.shtml)[27].Sequencelogos werederivedusingtheWebLogoplatform[28].StructureanalysisVisualizationandcomparisonofproteinstructuresand manualdockingofligandmoleculeswereperformed usingPyMol(ThePyMOLMolecularGraphicsSystem, Version1.4.1,Schrdinger,LLC).XtalView[7]wasused fortheproteindockingexercises.PhylogeneticanalysesThesurveyofthe1996completeprokaryoticgenomes availableattheNCBI(http://www.ncbi.nlm.nih.gov/) usingBLASTP[17](defaultparameters)allowedidentificationof119bacterialand144archaealDUF71homologsinadditiontothe182eukaryoteshomologs identifiedintheRefSeqdatabaseattheNCBI[29] (Additionalfile1:TableS1).Theretrievedsequences werealignedusingMAFFT[8]andtheresultingalignment wasvisuallyinspectedusingED,thealignmenteditorof theMUSTpackage[30].Thephylogeneticanalysisofthe 445sequencewasperformedusingtheneighbor-joining distancemethodimplementedinSeaView[31].The robustnessoftheresultingtreewasassessedbythenonparametricbootstrapmethod(100replicatesofthe originaldataset)implementedinSeaView.Asecondphylogeneticanalysisrestrictedto50archaealandeukaryotic homologsrepresentativeofthegeneticandgenomicdiversityofthesetwoDomainswasperformedusingthe BayesianapproachimplementedinPhylobayes[6]witha LGmodel.ResultsanddiscussionComparativegenomicspointstoDUF71/COG2102asa strongcandidateforthemissingdiphthamidesynthase familyThedistributionofknowndiphthamidebiosynthesis genesinArchaeawasanalyzedusingtheSEEDdatabase anditstools[20].The59archaealgenomesanalyzedall containedanEF-2encodinggene.AnalysisofthedistributionofDph2andDph5inArchaeashowedthat58/59 genomesencodedthesetwoproteins.Theonlyarchaeon lackingbothDph2andDph5was Korarchaeumcryptofilum OPF8(Figure2A).Wethereforehypothesizedthat thisorganismhaslostthediphthamidemodification pathwayevenifthe K.cryptofilum EF-2stillharborsthe conservedHisresidueatthesiteofthemodification (His603inthe K.cryptofilum sequence Accession B1L7Q0inUniprotKB).UsingtheIMG/JGIphylogenetic querytools[21],wesearchedforproteinfamiliesfound inallArchaeaexcept Korarchaeumcryptofilum OPF8, presentin Saccharomycescerevisiae and Homosapiens butabsentin Escherichiacoli and Bacillussubtilis ,as bacteriaareknowntolackthismodificationpathway. Onlyonefamily,DUF71/COG2102,followedthistaxonomicdistribution.ThisfamilyhadbeendescribedpreviouslyasaPP-loopATPaseofunknownfunction containingaRossmannoidclassHUPdomain[32]. UsingtheneighborhoodanalysistooloftheSEED database[20],physicalclusteringwasgenerallynot observedbetweenthe dph2 dph5 and DUF71 genesexceptinthree Methanosarcina genomeswherethe dph5 islocatedinthevicinityof DUF71 genes(Figure2B).If membersoftheDUF71catalyzethelaststepofdiphthamidesynthesistheyshouldbindATP[12].Structural analysisoftheDUF71proteinfrom Pyrococcusfuriosus (PF0828)revealsthepresenceoftwodistinctdomains: anN-terminalHUPdomainthatcontainsahighlyconservedPP-motifthatinteractswithATP(PDBid:3RK1) andAMP(PDBid:3RK0),andaC-terminal100-residue domainbelongingtoanovelfoldwithahighlyconserved motifGEGGEF/YE188T/S( P furiosus numbering) thatisprobablyinvolvedinsubstratebindingand recognition[33].deCrcy-Lagard etal.BiologyDirect 2012, 7 :32 Page3of13 http://www.biology-direct.com/content/7/1/32 PAGE 4 Coexpression,phenotypeandstructuraldatalinkthe yeastDUF71totranslationanddiphthamidebiosynthesisYLR143wistheonly S cerevisiae DUF71familymember.UsingYLR143wasinputintheSPELLcoexpressionquerytool[23]showedthatnearlyall co-expressedgeneswereinvolvedintranslationand ribosomebiogenesis(Additionalfile2:TableS2).This observationsuggestedthattheDUF71proteinfamily hasaroleintranslationasexpectedforaproteinmodifyingEF-2.Likeallknowndiphthamidesynthesisgenes, YLR143w isalsonotessential.Morespecifically,deletion ofanyofthefiveknowndiphthamidegenesconferssordarinresistanceinyeast[34,35]and ylr143w strainwas showntobeasresistanttothiscompoundasthe diphthamidedeficientstrains(seesupplementaldatain [34]).Furthermore,inarecentcompleteanalysisofrelationshipsbetweengenefitnessprofiles(co-fitness)and druginhibitionprofiles(co-inhibition)fromseveralhundredchemogenomicscreensinyeast[25,26]availableat http://fitdb.stanford.edu/,itwasfoundthatamongthe topteninteractorswithYLR143wbyhomozygouscosensitivityareDPH5,DPH2,DPH4(orJJJ3)andthe newlyidentifiedDPH7(orYBR246w)(Additionalfile3: FigureS1).Boththecoexpressionandphenotypedata therebystronglysupportthehypothesisthatYLR143w catalyzesthemissinglaststepofdiphthamidebiosynthesis,evenifonecannotruleoutatthisstagethat othercatalyticsubunitsyettobeidentifiedmayalsobe required. Finally,comparisonofATP-andAMP-containing structuresofPF0828revealsthattheactivesiteofthe formerhasanarrowgrooveattheendofwhichonlythe -phosphateofATPisexposedtothesolventwhereas theactivesiteofthelatteriswideopen(Figure3Aand B).Also,thereisasharpturnatthe -phosphateof ATP,suggestingthatitisthesiteofthenucleophilicattack.Wethereforeperformedadockingexerciseusing theEF-2structure(PDBid:3B82)[36]withtheATPBMethanohalophilusmahii YP_003542089.1 Methanosalsumzhilinae YP_004615790 Candidatus Nanosalina sp. J07AB43 EGQ42883.1 Duf71 (PF01902) Asn_Synthase_B_ C(cd01991) C Duf71 (PF01902) RidA (PF01042) RidA (PF01042) S. cerevisiae YLR143W Arabidopsis thaliana AT3g04480AMethanosarcinamazei Goe1 Methanosarcinaacetivorans C2A Methanosarcinabarkeri str. fusaro Dph5COG1849COG1885RbsK DUF71/ COG2102 Dph2 Dph5 Duf71 EF2 AsnB(COG0367) Gat-II (cd00352) Figure2 ComparativegenomicanalysisoftheDUF71family. ( A )DistributionofthecorediphthamidegenesDph2andDph5andofEF-2 andDUF71inArchaea,accordingtodataderivedfromthe Diphthamidebiosynthesis subsystemintheSEEDdatabase.Thetreeisaspeciestree constructediniTol(itol.embl.de/).Thepresenceandabsenceofthespecificgeneswasderivedfromthe Diphthamidebiosynthesis subsystem. ( B )PhysicalclusteringofDUF71/COG2102geneswithDph5inthree Methanosarcina genomesderivedfromtheMicrobesOnlinedatabase(http:// www.microbesonline.org/).( C )ExamplesofproteinscontainingdomainsfusedtoDUF71inArchaeaandEucarya.AccessionnumbersandCOG, CDD,orPfamdomainnumbersaregiveninparentheses. deCrcy-Lagard etal.BiologyDirect 2012, 7 :32 Page4of13 http://www.biology-direct.com/content/7/1/32 PAGE 5 containingstructureofPF0828.Thedockingrevealed thattheactivesitegrooveoftheATP-containingstructurecaneasilyaccommodatediphthinewithafewminor clashesbetweenthetwostructures(Figure3AandB). Themodelingalsoshowedthatthecarboxylgroup ofdiphthineresidesnearthe -phosphateofATPand carboxylategroupofresidueGlu188,suggestingthat nucleophilicattackbydiphthineonthe -phosphate ofATPishighlyfeasible(Figure3B).Asshownin Figure3B,themodellingalsoshowsthatseveralresidueswhicharehighlyconservedamongarchaealand eukaryoticPF0828andYLR143worthologsbeside E188,includingS44,Y45,E78,Y103,Q104,A149,E183and E186(Additionalfile3:FigureS2),areattheinterface ofthemodelledcomplexofPF0828withEF-2,supportingthehypothesisthattheyplayimportantroles inEF-2recognition(Figure3B).LinkingDUF71familymemberstoammoniatransfer reactionsThediphthineammonialyasereactionrequiresasource ofNH3[12].Domainfusionsinvolvingmembersofthe DUF71familyinthePfamdatabase[15]suggeststhe sourceofNH3mightvarydependingontheorganism. Forexample,inafewArchaea(e.g. Methanohalophilus mahii DSM5219, Methanosalsumzhilinae DSM4017or Candidatus Nanosalinarumsp.J07AB56 ),aCOG0367/ AsnBasparaginesynthetase[glutamine-hydrolyzing](EC 6.3.5.4)domainisfoundattheN-terminusoftheDUF71 domain(Figure2C).ThisAsnBdomaincanbefurther separatedintotwosubdomains,anN-terminalclass-II glutamineamidotransferasedomain(GAT-II)[37]andan Asn_Synthase_B_CPP-loopATPasedomain(Figure2C). Thisdomainorganizationsuggeststhatinthissubsetof enzymes,thehydrolysisofglutaminecatalyzedbythe GAT-IIdomaincouldprovidetheNH3moietytoboththe DUF71andtheAsn_Synthase_B_Cenzymes.Onthe otherhand,inmanyeukaryotessuchasyeastand Arabidopsisthaliana ,twoYjgF-YER057c-UK114-likedomains arefusedtotheC-terminusoftheDUF71proteinaspreviouslynotedbyAravindetal.[32](Figure2C).The stand-alonemembersoftheYjgF-YER057c-UK114family, nowcalledtheRidAfamily(forreactiveintermediate/ iminedeaminaseA),havebeenshowntodeaminateproductsgeneratedbyPLP-dependentenzymes,whichresults inthereleaseofNH3[38].TheRidAdomainsfusedto Figure3 StructuralanalysisoftheDUF71(PF0828)putative activesite. ( A )DockingofmodifiedEF-2(cyan,PDBid:3B82)onto ATP-boundstructureofPF0828(yellow,PDBid:3RK1).ATPand severalresiduesofPF0828(DUF71),whichareconservedamong archaealandeukaryoticorthologs,anddiphthineofEF-2(seetext fordetails)areshowninstickmodels.( B )Close-upstereopairof panelA.DiphthineofEF-2andthesidechainsofconserved residuesofPF0828,attheinterfaceofPF0828andEF-2,areshown instickmodelsandlabeled.( C )StereopairviewofATP-binding regionofPF0828.ResiduesthatareconservedamongDph6and DUF71-B12familiesaredepictedinstickmodelswithcarbonatoms incyan,whiletheresiduesthatarespecifictoDph6familyare showninstickmodelswithcarbonatomsingreen.Oxygenand nitrogenatomsareshowninredandblueinallstickmodels, respectively. deCrcy-Lagard etal.BiologyDirect 2012, 7 :32 Page5of13 http://www.biology-direct.com/content/7/1/32 PAGE 6 DUF71couldthereforebeinvolvedinprovidingtheNH3ammoniummoietyfordiphthamidesynthesis.TheDuf71familyisnotmonofunctionalThetaxonomicdistributionofDUF71homologsinavailablecompletegenomesconfirmedthatDUF71ispresent inoneoroccasionallytwocopiesinallArchaeaexceptthe korarchaeon K cryptofilum (Table1andAdditionalfile1: TableS1).Thispatternisconsistentwithanancientorigin oftheDUF71geneinArchaea.Insharpcontrast,DUF71 issporadicallydistributedinBacteria,beingpresentonly inafewrepresentativesofsomephyla(Table1and Additionalfile1:TableS1).Thispatternfitseitherwithan ancientoriginofDUF71inBacteriafollowedbynumerous lossesor,conversely,withamorerecentacquisitionfollowedbyhorizontalgenetransfer(HGT)amongbacterial lineages.Tofurtherinvestigatetheevolutionaryhistoryof DUF71,wemadeaphylogeneticanalysisofthehomologs identifiedinthethreeDomainsofLife.Theresultingtree showedtwodivergentgroupsofsequences.Thefirst groupcontainstheeukaryoticandnearlyallarchaeal sequences(includingthepredictedyeastDPH6(YLR143w) and P furiosus PF0828),whereasthesecondencompassesallthebacterialsequencesaswellasthesecond copyfoundinafewarchaealgenomes(Figure4and Additionalfile3:FigureS3). Thissecondgroupemergedfromwithinthearchaeal sequencesofthefirstclusterandshowedvarious contradictionswiththecurrentlyrecognizedtaxonomy becausebacterialsequencesfromdistantlyrelated lineagesappearedintermixedinthetree(Figure4). Theseobservationstogetherwiththeextremelypatchy distributionofDUF71inbacteriastronglysupportsthe hypothesisthatthebacterialDUF71wasofarchaeal originandspreadthroughthisdomainmainlybyHGT. Interestingly,thesecondhomologspresentinafew archaealgenomesemergedfrombacterialsequences, suggestingthatsecondaryHGToccurredfromBacteria toArchaeaallowingthemacquiringasecondDUF71 homolog. Incontrast,aphylogeneticanalysisfocusedon archaealandeukaryoticsequencesstronglysupported theseparationbetweenthesetwoDomains(posterior probabilities(PP)=1).Moreoveritrecoveredthemonophylyofmosteukaryoticandarchaealmajorlineages (mostPP>0.95,Additionalfile3:FigureS3),suggesting thatDUF71waspresentintheirancestors.However,as expectedgiventhesmallnumberofaminoacidpositions analyzed(182positions),therelationshipsamongthese lineagesweremainlyunresolved(mostPP<0.95)precludingthein-depthanalysisoftheancientevolutionary historyofDUF71inArchaeaandEucarya(Additional file3:FigureS3).Nevertheless,thewidedistributionof DUF71inthesetwoDomains(eveninhighlyderived parasitessuchas Microsporidia Cryptosporidium Entamoeba or Nanoarchaeumequitans ,notshown)and itsancestralpresenceinmostoftheirorders/phylasuggestedthatthisgenewaspresentinthelastcommonancestorofthesetwoDomains.Thisinferencedoesnot imply,however,thatnoHGToccurredinthese Domains.Indeed,someincongruencebetweenthe DUF71phylogenyandthereferencephylogenyoforganisms[39]suggestedputativecasesofHGT.Forinstance, itwasobservedforthe Thermofilumpendens DUF71 Table1TaxonomicdistributionofDUF71homologsinarchaealandbacterialgenomesPhylumNb(%)genomesPhylumNb(%)genomesPhylumNb(%)genomes Archaea Crenarchaeota37/37(100%)Korarchaeota0/1(0%)Thaumarchaeota2/2(100%) Euryarchaeota79/79(100%) Bacteria Acidobacteria3/7(42.9%)Dictyoglomi0/2(0%)Proteobacteria_Epsilon0/64(0%) Actinobacteria1/206(0.5%)Elusimicrobia0/2(0%)Proteobacteria_Gamma27/406(6.7%) Aquificae0/10(0%)Fibrobacteres0/2(0%)PVC_Chlamydiae1/73(1.4%) Bacteroidetes20/73(27.4%)Firmicutes20/484(4.1%)PVC_Planctomycetes3/6(50%) Caldiserica0/1(0%)Fusobacteria0/5(0%)PVC_Verrucomicrobia0/4(0%) Chlorobi0/11(0%)Gemmatimonadetes0/1(0%)Spirochaetes1/45(2.2%) Chloroflexi5/16(31.3%)Ignavibacteria0/1(0%)Synergistetes0/4(0%) Chrysiogenetes0/1(0%)Nitrospirae1/3(33.3%)Thermodesulfobacteria0/2(0%) Cyanobacteria0/45(0%)Proteobacteria_Alpha2/204(1%)Thermotogae5/14(35.7%) Deferribacteres0/4(0%)Proteobacteria_Beta8/119(6.7%) Deinococcus-Thermus2/17(11.8%)Proteobacteria_Delta1/48(2.1%)ThenumberofgenomesperphylumcontainingatleastonehomologofDUF71isindicated.deCrcy-Lagard etal.BiologyDirect 2012, 7 :32 Page6of13 http://www.biology-direct.com/content/7/1/32 PAGE 7 Chromobacterium violaceum ATCC 12472 NP_902058 Aeromonas salmonicida subsp. salmonicida A449 YP_001142643 Aeromonas veronii B565 YP_004393356 Aeromonas hydrophila subsp. hydrophila ATCC 7966 YP_856022 Pyrococcus sp. NA2 YP_004423066 Pyrococcus abyssi GE5 NP_126295 Pyrococcus furiosus DSM 3638 NP_578024 Thermococcus kodakarensis KOD1 YP_183274 Pyrococcus horikoshii OT3 NP_142349 Thermococcus barophilus MP_YP_004070751 Thermococcus onnurineus NA1 YP_002307067 Thermococcus gammatolerans EJ3 YP_002958908 Thermococcus sp. AM4 YP_002582960 Pyrolobus fumarii 1A YP_004780286 Ignisphaera aggregans DSM 17230 YP_003859058 Acidilobus saccharovorans 345-15 YP_003816038 Pyrobaculum aerophilum str. IM2 NP_558567 Pyrobaculum calidifontis JCM 11548 YP_001056423 Pyrobaculum oguniense TE7 YP_005259912 Thermoproteus tenax Kra 1 YP_004892056 Thermoproteus uzoniensis 768-20 YP_004336982 Thermoproteus neutrophilus V24Sta YP_001793918 Pyrobaculum islandicum DSM 4184 YP_930950 Aeropyrum pernix K1 NP_148342 Aliivibrio salmonicida LFI1238 YP_002264215 Vibrio fischeri MJ11 YP_002157220 Vibrio fischeri ES114 YP_205817 Vibrio furnissii NCTC 11218 YP_004991716 Vibrio anguillarum 775 YP_004564701 Photobacterium profundum SS9 YP_131536 Vibrio splendidus LGP32 YP_002418517 Vibrio vulnificus YJ016 NP_932960 Vibrio vulnificus CMCP6 NP_760118 Vibrio vulnificus MO6-24 O YP_004187372 Vibrio sp. EJY3 YP_005024419 Vibrio sp. Ex25 YP_003284785 Vibrio parahaemolyticus RIMD 2210633 NP_799316 PlanctomyceslimnophilusDSM 3776 YP_003629882 Zobellia galactanivorans YP_004737775 Marivirga tractuosa DSM 4126 YP_004055467 Cytophaga hutchinsonii ATCC 33406 YP_678376 Zunongwangia profunda SM-A87 YP_003584394 Niastella koreensis GR20-10 YP_005012462 Cellulophaga lytica DSM 7489 YP_004260901 Croceibacter atlanticus HTCC2559 YP_003717135 Lacinutrix sp. 5H-3-7-4 YP_004580289 Gramella forsetii KT0803 YP_861903 Runella slithyformis DSM 19594 YP_004655601 Spirosoma linguale DSM 74 YP_003390559 Fluviicola taffensis DSM 16823 YP_004346296 Flavobacterium indicum GPTSA100-9 YP_005357093 Flavobacterium psychrophilum JIP02 86 YP_001295914 Flavobacterium columnare ATCC 49512 YP_004941086 Flavobacterium johnsoniae UW101 YP_001195269 RhodothermusmarinusSG0.5JP17-172 YP_004825176 Rhodothermus marinus DSM 4252 YP_003290637 Thermobaculum terrenum ATCC BAA-798 YP_003321764 Salinibacter ruber M8 YP_003571382 Salinibacter ruber DSM 13855 YP_445440 Conexibacter woesei DSM 14684 YP_003396308 Thermaerobacter marianensis DSM 12885 YP_004102200 Pseudoxanthomonas suwonensis 11-1 YP_004147801 Nitrobacter hamburgensis X14 YP_575433 Nitrobacter winogradskyi Nb-255 YP_316684 PelobactercarbinolicusDSM 2380 YP_357837 PlanctomycesbrasiliensisDSM 5305 YP_004271423 Candidatus SolibacterusitatusEllin6076 YP_821425 AcidobacteriumcapsulatumATCC 51196 YP_002755976 Nitrosomonas eutropha C91 YP_748349 IsosphaerapallidaATCC 43644 YP_004179529 Candidatus KoribacterversatilisEllin345 YP_592723 Ferrimonas balearica DSM 9799 YP_003914428 Aeromonas hydrophila subsp. hydrophila ATCC 7966 YP_854946 Aeromonas veronii B565 YP_004394294 Aeromonas salmonicida subsp. salmonicida A449 YP_001143626 Nitrosomonas europaea ATCC 19718 NP_841603 Gallionella capsiferriformans ES-2 YP_003848368 Sideroxydans lithotrophicus ES-1 YP_003523855 Meiothermus silvanus DSM 9946 YP_003686661 Flavobacterium indicum GPTSA100-9 YP_005356948 Flavobacterium columnare ATCC 49512 YP_004941633 Enterococcus faecalis OG1RF YP_005708970 Enterococcus faecalis 62 YP_005706644 Enterococcus faecalis V583 NP_816130 Methanocorpusculum labreanum Z YP_001030536 Brachyspira intermedia PWS A YP_005593503 Clostridium clariflavum DSM 19732 YP_005047925 Clostridium lentocellum DSM 5427 YP_004310192 Clostridium difficile BI1 YP_006198469 Clostridium difficile R20291 YP_003217762 Clostridium difficile CD196 YP_003214316 Clostridium difficile 630 YP_001087917 Clostridium botulinum E3 str. Alaska E43 YP_001922040 Clostridium beijerinckii NCIMB 8052 YP_001308407 Clostridium botulinum B str. Eklund 17B YP_001885603 Clostridium perfringens SM101 YP_698440 Clostridium perfringens ATCC 13124 YP_695745 Clostridium perfringens str. 13 NP_561960 Methanosaeta concilii GP6 YP_004384553 Methanosarcina mazei Go1 NP_634837 Dehalococcoides sp. CBDB1 YP_307398Dehalococcoides sp. VS YP_003329581 Dehalococcoides ethenogenes 195 YP_180986 Dehalococcoides sp. BAV1 YP_001213574 Dehalococcoides sp. CBDB1 YP_307267 Dehalococcoides sp. BAV1 YP_001213750 Dehalococcoides sp. VS YP_003330762 Dehalococcoides sp. VS YP_003329596 Dehalococcoides sp. VS YP_003330811 Dehalococcoides sp. CBDB1 YP_307272 Dehalococcoides sp. BAV1 YP_001213579 Dehalococcoides sp. GT YP_003463115 Dehalococcoides sp. CBDB1 YP_308519 Dehalococcoides sp. GT YP_003463086 Dehalococcoides sp. CBDB1 YP_308477 Lysinibacillus sphaericus C3-41 YP_001698527 Clostridium perfringens ATCC 13124 YP_695178 Clostridium perfringens SM101 YP_698039 Paenibacillus sp. JDR-2 YP_003009328 Acinetobacter oleivorans DR1 YP_003731161 Acinetobacter calcoaceticus PHEA-2 YP_004996279 Acinetobacter baumannii MDR-ZJ06 YP_005526958 Acinetobacter baumannii 1656-2 YP_005515444 Acinetobacter baumannii ACICU YP_001847477 Acinetobacter baumannii AB307-0294 YP_002324816 Acinetobacter baumannii AB0057 YP_002320315 Acinetobacter baumannii AYE YP_001712867 Archaeoglobus fulgidus DSM 4304 NP_069554 Thermodesulfovibrio yellowstonii DSM 11347 YP_002249002 Collimonas fungivorans Ter331 YP_004754874 Thiomonas intermedia K12 YP_003642272 Laribacter hongkongensis HLHK9 YP_002796382 Clostridium sticklandii DSM 519 YP_003937452 WaddliachondrophilaWSU 86-1044 YP_003709136 Methanococcus maripaludis C7 YP_001330978 Fervidobacterium pennivorans DSM 9078 YP_005471911 Fervidobacterium nodosum Rt17-B1 YP_001410876 Thermotoga thermarum DSM 5069 YP_004660873 Thermosipho africanus TCF52B YP_002334883 Thermosipho melanesiensis BI429 YP_001305965 0.1 91 100 51 100 64 58 100 96 70 100 86 100 95 98 100 95 100 99 73 70 100 72 96 81 100 70 72 53 78 100 100 100 59 74 100 98 67 54 50 75 100 93 100 100 100 100 100 100 70 100 100 99 51 53 64 58 100 100 100 100 100 100 62 62 62 99 100 82 100 56 100 56 100 100 100 100 100 100 88 100 100 87 71 100 100 93 96 94 Eucarya Archaea Actinobacteria Acidobacteria Bacteroidetes Chloroflexi Deinococcus_Thermus Firmicutes Nitrospirae Proteobacteria_Alpha Proteobacteria_Beta Proteobacteria_Delta Proteobacteria_Gamma PVC_Planctomycetes PVC_Chlamydiae Spirochaetes Thermotogae Cluster 1/Dph6 Cluster 2/Duf71-B12 Cluster 3 Figure4 (Seelegendonnextpage.) deCrcy-Lagard etal.BiologyDirect 2012, 7 :32 Page7of13 http://www.biology-direct.com/content/7/1/32 PAGE 8 thatrobustlygroupswithMethanomicrobia(Euryarchaeota)andnotwithotherThermoproteales(Additional file3:FigureS3). Becausediphthamideisamodificationspecifictothe archaealandeukaryoticEF-2proteinsandbacterialackall knowndiphthamidebiosynthesisgenes,weproposethat cluster1inourphylogenycorrespondsto bonafide Dph6 enzymesinvolvedindiphthamidesynthesis(Figure4). Thisfunctionthereforeverylikelyrepresentstheancestral functionofthewholeDUF71family.Incontrast,bacteria donotsynthesizediphthamide,suggestingthatthebacterialDUF71homologsandthefewadditionalarchaealcopies(cluster2,Figure4)areinvolvedinanotherfunction, andthusafunctionalshiftoccurredaftertheHGTofan archaealbonafideDph6tobacteria.Notably,these genes(includingPF0295,thesecondDUF71copy foundin P furiosus )arestronglyclusteredonthe chromosomewithvitaminB12salvagegenes.More precisely75/102areadjacenttovitaminB12transportergenes(suchastheBtuCDFgenes)[40]and 18/102areadjacentto cbiB genesencoding adenosylcobinamide-phosphatesynthetase,anenzymesharedbythe denovo andsalvagepathways [41](Figure5A).Thisclusteringdatacanbevisualizedinthe Duf71-B12 subsystemintheSEED database,andtwotypicalclustersareshownin (Seefigureonpreviouspage.) Figure4 Neighbor-joiningphylogenetictreeofthe445DUF71homologsidentifiedinpublicdatabases. Thescalebarrepresentsthe averagenumberofsubstitutionspersite.Numbersatnodesarebootstrapvalues.Forclarityonlyvaluesgreaterthan50%areindicated.Colors correspondtothetaxonomicaffiliationofsequences(seetheboxonthefigurefordetails).ThefulltreeofCluster1isshowninAdditionalfile3: FigureS3). APyrobaculumcalidifontis cobY cobA cbiBDUF71-B12 cobD cobS cobT aCbiZ/CobZ Clostridium perfringens cobUcobC cbiBDUF71-B12 btuD btuF cobT CbiP btuC cobD cobS B BtuB FCDCbiCbi CobA AdoCbi CobUTSC(Bacteria) CobYZ(Archaea)AdoCbi-P aCbiZ AdoCby De novo corrinring synthesis CbiB Ado-B12 Ado-Pseudo-B12 BtuB FCD BtuB FCDB12 Pseudo-B12 bCbiZ AP AP-P -AMP-AP DMB CobU CbiP Thr-P CobA Pseudo-B12 B12 CobA CobD Figure5 LinksbetweentheDUF71familyandB12salvage. ( A )SummaryofcobinamidederivativesalvageinBacteriaandArchaea;arrows withdottedlinesdenotemultiplesteps.( B )TypicalexamplesofphysicalclusteringofDUF71-B12geneswithB12salvagegenesinArchaeaand Bacteria.Abbreviations:Pseudo-B12,adenosylpseudocobalamin;Cbi,Cobinamide;AdoCbi,adenosylCbi;AdoCbi-P,AdenosylCbi-phosphate; AdoCby,adenosylcobyricacid;AP;(R)-1-amino-2-propanol;AP-P,AP-phosphate;Thr-P,L-threonine-phosphate;DMB,5,6-dimethylbenzimidazol e; AMP-AP, -adenylate-AP;CobU,ATP:AdoCbikinase,GTP:AdoCbi-GDPguanylyltransferase;CobY,NTP:AdoCbi-Pnucleotidyltransferase;CobA,ATP:co (I)rrinoidadenosyltransferase;aCbiZ,adenosylcobinamideamidohydrolase;bCbiZ,pseudo-B12amidohydrolase;CbiB,cobyricacidsynthetase; CobD,L-threoninephosphatedecarboxylase;CobS,cobalamin(5-P)synthase;CobT,5,6-dimethylbenzimidazolephosphoribosyltransferase;CobC orCobZ,alpha-ribazole-5 -phosphatephosphatase;cobY,adenosylcobinamide-phosphateguanylyltransferase;CbiP,cobyricacidsynthase;BtuFCD, cobamidetransportersubunits. deCrcy-Lagard etal.BiologyDirect 2012, 7 :32 Page8of13 http://www.biology-direct.com/content/7/1/32 PAGE 9 Figure5B.Onthisbasis,wehypothesizethatthe archaealandbacterialDUF71genesthatclusterwith B12vitamingeneshavearoleinB12metabolism. Finally,somebacterialDUF71proteinsmightalsohave otherfunctionsbecauseasetofbacteriasuchasClostridiumperfringenshavetwoormoreDUF71homologs (Figure4andAdditionalfile1:TableS1).ThemostextremeexampleisDehalococcoidessp.CBDB1,which encodesfiveDUF71homologsinitsgenome.Inthecase ofC.perfringensATCC13124andSM101,onehomolog (YP_695745andYP_698440)clustersbothphysicallyand phylogenetically(Figure4and5A)withtheB12subgroup proteins,whereasthesecondhomolog(YP_695178and YP_698039)isrelatedtoAcinetobacterbaumanii(Cluster 3,Figure4)andisnotfoundassociatedtogeneclusters relatedtoB12salvage(datanotshown). Therefore,basedonphylogeneticandphysicalclusteringtheDUF71proteinsweresplitinto:theDph6and theDUF71-B12subgroupsthatwereannotatedassuch andcapturedinthe Diphthamidebiosynthesis and Duf71-B12 subsystemsintheSEEDdatabase.PredictingthefunctionofmembersoftheDUF71-B12 subgroupAsmembersoftheDUF71-B12subgroupclustered stronglywithB12transportgenesandwith cbiB (Figure5B),wefocusedontheearlystepsonB12salvage, whicharequitediversebecauseseveralformsofcobamides[cobalamin-likeorCbl-likecompounds]canbesalvaged(Figure5A).Cobinamide(Cbi)isadenylatedafter transporttoformadenosylcobinamide(AdoCbi).Inmost bacteria,AdoCbiisdirectlyphosphorylatedbyCobUbeforebeingtransformedafterseveralstepsintoadenosylcobalamin(AdoCblorcoenzymeB12),inwhichthelower ligandis5,6-dimethylbenzimidazole(DMB)(see[42]for review)(Figure5A).Archaeauseadifferentsalvageroute inwhichAdoCbiisconvertedtoadenosylcobyricacid (AdoCby),anintermediateofthe denovo pathway,byan amidohydrolase,aCbiZ[43](Figure5A).AdoCbyisthen converteddirectlytoadenosylcobinamide-phosphate (AdoCbi-P)byCbiB.FinallysomebacteriahaveCbiZ homologs(bCbiZ)thathydrolyzeadenosylpseudocobalamin(Ado-Pseudo-B12)[44],whichcontainsanadenine insteadofDMBasitslowerligand(Figure1Cand5A). Inordertogaininsightintothepossiblefunctionof DUF71-B12familymembers,weanalyzedthecodistributionpatternofCbiZ,CbiBandDUF71-B12proteinsinArchaeaandBacteria.Interestingly,toafew exceptions,allprokaryoticgenomesencodingCbiBharboreitherCbiZorDUF71-B12(Figure6).However,in bacteria,therewasstrictanti-correlationbetweenthe DUF71-B12andtheCbiZfamilies(Figure6A).Thiswas notthecaseinArchaeawherequiteafeworganisms (suchas P furiosis or Methanosarcinamazei Go1) harboredbothfamilies(Figure6B).Thisdistribution profilesuggeststhatmembersoftheDUF71-B12subfamilyfulfilthesamerolesasthebacterialCbiZenzymes (bCbiZ),eitherbycatalysingthesamereaction(cleaving Ado-pseudo-B12intoAdoCby)orbyprovidinganother routetosalvagingPseudo-B12.Thishypothesiswould explainwhybacteriawouldhaveoneortheotherwhile Archaeacouldcarryboth(Figure6B),becausearchaeal CbiZproteinshavebeenpredictedtolackpseudo-B12 cleavageactivity[44]. Detailedanalysisofthesignaturemotifsofthetwo subfamiliesrevealthatthestrictlyconservedEGGE/ DXE188motif( P furiosus PF0828numbering)in Dph6proteinsisreplacedbyaENGEF/YH188motif intheDUF71-B12proteins(Additionalfile3:Figure S2andAdditionalfile3:FigureS4).IntheDph6 family,E188islocatednearthepredicteddiphthine bindingsiteandispredictedtobeinvolvedincatalysis(Figure3B).ThereplacementofthestrictlyconservedE188residuebyaHistidineresiduestrongly suggestachangeinthereactioncatalyzedbythe DUF71-B12subfamilycomparedtotheDph6family. Thestructurebasedcomparisonbetweenthetwo subfamiliesalsostronglysupportsthehypothesisthat theirsubstratesaredifferent,becauseallresiduespredictedtobeinvolvedinEF-2binding(Figure3Bsee sectionabove)aredifferentintheDUF71-B12subfamilybutmostlyconservedwithinthissubfamily (Additionalfile3:FigureS2andresiduesingreenin Figure3C).Residuesthatareconservedbetweenthe twoDUF71subfamilies(Additionalfile3:FigureS2 andresiduesinblueinFigure3C)arefoundaround thephosphategroupsofATP,includingS12,G13,G14, K15,D16,H48,andT189(PF0828sequencenumbering) orbelongtotheC-terminalconservedsequencemotif (EGGE/D-X-E188)suchasG182,G184,G185,E186,F187(Additionalfile3:FigureS2andFigure3C).Further experimentalstudieswillberequiredtodetermine whetherDUF71-B12proteinsareAdo-pseudo-B12 amidohydrolasesorhaveanotherroleinAdo-pseudoB12salvage.ConclusionsOurdetailedanalysesoftheDUF71familymemberspresentedhereprovideanexampleofthepowerofcomparativegenomicapproachesforsolvingimportant missing genes or missingfunction cases.Theseanalysessimultaneouslyillustratethedifficultiesinherentinaccurately annotatinggenefamilies.Ononehand,theevidenceidentifyingacandidateforthemissingDph6genefamily derivedfromgenomicevidence(mainlyphylogeneticdistributionandgenefusions)andpost-genomicevidence (structure,co-expressionanalysisandgenome-wide phenotypeexperiments)issostrongthatitcouldbeuseddeCrcy-Lagard etal.BiologyDirect 2012, 7 :32 Page9of13 http://www.biology-direct.com/content/7/1/32 PAGE 10 asanexamplewherethefunctionalannotationofaproteinofunknownfunctioncouldbederivedfromcomparativegenomicalone.Ontheotherhandouranalysesshow thatasubgroupoftheDUF71familyismostcertainly involvedinametabolicpathwayunrelatedtodiphthamide synthesisandthattransferringfunctionalannotations fromhomologyscoresalonewouldbeinappropriatein thiscase.Webelievethatthisintegratedfunctionalannotationapproachwillplayanimportantroleinfuturepipelinesforannotationofproteinfamilies.AdditionalfilesAdditionalfile1: TableS1. GenbankRefSeqidentitiesand correspondingorganismsforallproteinsusedinthephylogenies. Additionalfile2: TableS2. GOTermEnrichmentSpellanalysis(http:// imperio.princeton.edu:3000/yeast)withYLR143wasinput. Additionalfile3: FigureS1. Top10interactorswithYLR143Wby homozygousco-sensitivityinS.cerevisiae(fromtheYeastfitness databasehttp://fitdb.stanford.edu/fitdb.cgi?query=YLR143W). FigureS2 MultiplesequencealignmentofselectedDph6familyandDUF71-B12 familysequencesgeneratedusingtheMultialinplatform(http://multalin. toulouse.inra.fr/multalin/)Strictlyconservedresiduesbetweenthetwo familiesareinred.ResiduesconservedonlyintheDph6familyareboxed ingreen.ResiduesfoundaroundthephosphategroupofATParenoted byredarrows.Secondarystructuralelements,yellowrectanglesfor helixandcyanarrowsfor -strand,shownabovethealignment,arefrom thecrystalstructureofP.furiosus_Dph6(PF0828)(PDBid:3RK1). Figure S3 BayesiantreeofarchaealandeukaryoticDph6sequences.Thescale barrepresentstheaveragenumberofsubstitutionspersite.Numberat nodesrepresentposteriorprobabilities.Forclarityonlyvaluesgreater than0.85areindicated. FigureS4 (Top)Sequencelogoderivedfrom95 Dph6sequencesextractedfromDiphthamidesubsysteminSEED.The E188reside(PF0828numbering)islocatedatposition10inthelogo. (Bottom)Sequencelogoderivedofthecorrespondingregionderived from102DUF71-B12sequencesextractedfromtheDUF71-B12 subsysteminSEED.Bothlogosweremadeathttp://weblogo.berkeley. edu/logo.cgibasedonclustalwderivedalignments. ABDUF71-B12 CbiZ CbiB Figure6 DistributionofDUF71-B12,CbiZandCbiBinbacterial(A)andarchaealgenomes(B). Thetreesarespeciestreeconstructedin iTol(itol.embl.de/),thepresenceandabsenceofthespecificgeneswasderivedfromthe DUF71-B12 subsystemintheSEEDdatabase. deCrcy-Lagard etal.BiologyDirect 2012, 7 :32 Page10of13 http://www.biology-direct.com/content/7/1/32 PAGE 11 Competinginterests Theauthor(s)declarethattheyhavenocompetinginterests. Authors contributions VdC-Lconductedthecomparativegenomicanalysisandmadethe functionalpredictions.CB-Aperformedthephylogeneticanalysis.FF,LTand JFHdidthestructuralanalysis.Allauthorsparticipatedinwriting/reviewing themanuscript.Allauthorsreadandapprovedthefinalmanuscript. Reviewers comments Reviewernumber1:ArcadyMushegian StowersInstituteforMedicalResearch,1000E50thStreet,KansasCity, Missouri64110 ThestudybydeCrecy-Lagardandco-authorspinpointstheDUF71/ COG2102asthemostlikelyarchaeal/eukaryoticATP-dependentdiphthineammonialigase,thesofarunaccounted-forenzymeinthepathwayof diphtamidebiosynthesis,whichpathwayisresponsiblefortheformationof uniquederivativeoftheconservedhistidinewithinthetranslation elongationfactor2.Adistinctsubfamilyofthisproteinfamilyappearstoplay anotherroleinbacteriaandasubsetofarchaea,mostlikelyinthesalvageof anintermediateofcobalaminebiosynthesis.Theevidencepresentedinthe paperconsistsofgenomecontextinformation,sequence-structure predictionandthedatafromyeastconcerninggeneexpressionand chemical-genomicsprofiling.Takentogether,theevidenceseems compellingtome.Thedatafromyeastrepresentpartialfunctionalvalidation ofpredictionsmadeforprokaryotes.Iwouldrecommendonlytotonedown thesuggestionthatallthisisa novelparadigm inanalysisofgenefunction: researchershavebeeninferringgenefunctionsfromphenotypes,aswellas fromdirectlydetectedchangesingenotype,foralong,longtime,andthe currentstudyisalogicalextensionoftheseapproaches.Whatisdifferentin thelast15yearsisthatwecancomparethesepropertiesacrossmany specieswithcompletelysequencedgenomes;buteventhisisalogical extensionofthepreviouswork(compare,forexample,withworkfrom YanofskyandJensenlabsonbiosynthesisofaromaticaminoacids)-itwas notanyprescriptionofapreviousscientificparadigmthatconstrainedthe work,butratherthelackofthedata. Response: Thereferencestoa novelparagdim wereeliminatedintheabstract andtheintroductionassuggested. Reviewernumber2:MichaelGalperin NCBI,NLM,NIHComputationalBiologyBranch,8600RockvillePikeMSC 6075,Building38A,Room6N601,Bethesda,MD20894-6075 ThepaperbydeCrecy-Lagardandcolleaguesisafineexampleofusing comparativegenomicstopatchtheremainingholesinthemetabolic pathways.Thekeyconclusionofthiswork,predictionoftheparticipationof themembersoftheDUF71/COG2102familyindiphtamidebiosynthesisin archaeaandeukaryotesandinB12metabolisminsomebacteriaandarchaea, isextremelyconvincingandhardlyevenneedsanexperimentalverification. Thesecondconclusion,thatammoniausedinthediphthineammonialyasecatalyzedreactionindifferentorganismscouldusegeneratedbytwo differentenzymes,asparaginesynthetaseandtheRidAdomain,alsosounds convincing.However,provingbeyondreasonabledoubtthatDUF71/ COG2102familymemberswiththeirATP-pyrophosphataseactivitycomprise thekeypartofdiphthineammonialyasedoesnotprovethattheyarethe onlysubunitsofthisenzyme.Eveniftheproposedreactionscheme (Figure1B)iscorrect,therestillmightbeaneedforaligasesubunitthat coupleremovaloftheAMPmoietyfromEF2withitsamidation.Thereisa definitepossibilitythatDUF71/COG2102familymemberscatalyzeallthese individualreactions,e.g.usingitsuniqueC-terminal100-aadomain,butthat wouldhavetobeprovenexperimentally.Thereportedinvolvementofthe likelyscaffoldproteinYBR246w(DPH7)appearstosupporttheideathat diphthineammonialyaseconsistsofmorethanonetypeofsubunits. Otherwise,itisagreatpaperthatvividlydemonstratesthepowerof comparative-genomicsapproaches. Weaddedaphrasestatingthat evenifonecannotruleoutatthisstage thatothercatalyticsubunitsyettobeidentifiedmayalsoberequired Reviewernumber3:L.Aravind NCBI,NLM,NIHComputationalBiologyBranch,8600RockvillePikeMSC 6075,Building38A,Room6N601,Bethesda,MD20894-6075 Thisworkusescontextualinformationtoidentifythediphthine-ammonia ligaseinarchaeaandeukaryotes.Italsoshowsthattheyeastprotein YBR246Wisindeednotthecorrectligase,butrathertheMJ0570-likePP-loop ATPases.Theauthorsalsoshowthatthisfamilyhasbeentransferredto certainbacteriawheretheyinferthatitislikelytohaveundergonea functionalshifttoparticipateinB12salvage.Theycautiouslyproposethatit mightfunctionasareplacementforCbiZtofunctionasanamidohydrolase (thereverseofthetypicalPP-loopATPasereaction)asagainstaligase.The conclusionsaredefinitiveandthearticlemakesausefulcontributiontothe understandingofproteinmodificationandcofactorbiosynthesis.Thissaid, therearecertainissueswiththecurrentformofthearticlethatauthors necessarilyneedtoaddressintheirrevision:1)(pg8)Theauthorsstatethat theMJ0570-likeenzymeshaveaHUPdomainfollowedbyadistinctCterminaldomain.Theydonotexplainthemeaningofthisproperlynorcite thereferenceofthepaper(PMID:12012333)pertainingtotheHUPdomains wherethisfamilywasidentifiedasaPP-loopATPase,alongwiththe observations(Table1inthatreference)thatithasaprimarilyarchaeoeukaryoticphyleticpattern,andthateukaryoticversionsmightbefusedto twoC-terminaldomainsoftheYabJ-likechorismatelyasefold(nowtermed RidA).ItshouldbestatedthattheN-terminusisaPP-loopATPasedomainof theHUPclassofRossmannoiddomains-notallHUPdomainsareligasesonlythePP-loopandtheHIGHnucleotidyltransferases.Thisclarifiesthatitis relatedtootherATP-utilizingamidoligasessuchasNADsynthethase,GMP synthetaseandasparaginesynthetase.Thiswouldplacetheirinferred amidoligaseactivityinthecontextofcomparable,knownamidoligase activitiesofrelatedenzymes.Infactitwouldbeadvisabletoplacethefact thatthesearePP-loopenzymesintheabstractitself. Thefollowingsentencewasadded: Thisfamilyhadpreviouslybeen previouslydescribedasaPP-loopATPaseofunknownfunctioncontaininga RossmannoidclassHUPdomain(Aravindetal.2002). AreferencetothePPloopATPasefamilywasaddedintheabstractasrequested.Areferenceto thesameworkwasaddedwhentalkingabouttheRidAfusion.Forthe phylogeneticdistributiontheresultspresentedhereareabitdifferentfrom thepreviousstudybecausemanymoregenomeareavailableafter10years andweshowthatthefamilyisalsobacterial. 2)TheauthorspersistentlyrefertothedomainasDUF71.Thisnameisno longercurrentinPfamandithaslongbeenrecognizedasmentionedinthe referencenotedabovethattheseproteinsarenot domainsofunknown function butPP-loopATPases.ThedomainiscorrectlytermedATP_bind_4 (PF01902)inPfam.ThisPfam(notthemisleadingDUF71)nameandPfam numbershouldbeindicatedwithjustastatementintheintroductionthatit wasformerlyDUF71. Thisdomainiscurrentlycalled DomainofunknownfunctionDUF71,ATPbindingdomain intheInterProdatabase(IPR002761)evenifitiscalled ATP_bind_4(PF01902)inPfam.Itismuchshortertouse(aswellaseasierfor thereadertofollow)theDUF71abbreviationratherthantheATP_bind_4 abbreviation.WethereforeprefertokeepDUF71.Wehoweverintroduceda statementgivingthedifferentnamesofthisdomainintheInterPro,Pfam andCOGdatabasesattheendoftheintroduction. 3 ) Theauthorsapparentlyhaveamisapprehensionregardingthe Methanohalophilusmahiiproteinbothinthetextandthedomainarchitecture renderedinthefigure First, theseproteinshavetwoN-terminaldomainsfused totheMJ0570-likemodule:namelyaN-terminalclass-II glutamineamidotransferase(GAT-II,e.g.seePMID:20023723)andsecondPPloopATPasedomainthereafter(i.e.onerelatedtoasparaginesynthetase).This GATdomainasinthecaseofotherPP-loopenzymescouldsupplyammoniaby cleavingitoffglutamine.ButthisdoesnotexplainwhichPP-loopdomain utilizesit.InthecaseoftheAsn-synthetaseitisusedbythecognatePP-loop domain.InthiscasethepresenceoftwoPP-loopdomainssuggeststhatitis eitherutilizedbybothfordifferentreactionsorelsetheseconddomaindoesnot receivetheNH3fromthisGAT.Thisalsoleadstothequestionwhatreactionis theAsnsynthetaselikePP-loopdomaincatalyzing ?deCrcy-Lagard etal.BiologyDirect 2012, 7 :32 Page11of13 http://www.biology-direct.com/content/7/1/32 PAGE 12 QualityofwrittenEnglish:Acceptable ThesourceoftheconfusioncamefromthefactthattheAsnSynthase domain(AsnB)containstwodomainstheGAT-IIdomainandthe Asn_Synthase_B_CPP-LoopATPasedomain.Boththefigureandthetext weremodifiedtoavoidtheconfusion.Basedonthereviewer scomments thesentencediscussingthepotentialroleoftheAsnBdomainwasmodified asfollows: Thisdomainorganizationstronglysuggeststhatinthissubsetof enzymes,thehydrolysisofglutaminecatalyzedbythefusedGAT-IIdomain couldprovidetheNH3moietytoboththeDUF71andthe Asn_Synthase_B_Cenzymes. 4)Basedonphyleticcomplementaritytheauthorssuggestthatbacterial CbiZmightbedisplacedbythebacterialMJ0570-likeenzymes.Thisseems unusual-WhyutilizeaPP-loopATPaseforthereversereaction,i.e. amidohydrolase?Typicallythereislittleoverlapbetweenthefamilies involvedinamidohydrolaseasopposedtoATP-dependentligaseactivity.Of thealmost12distinctmajorinventionsofamidoligaseactivity,hardlyany representativesofthesesuperfamilieshavebeenreusedasamidohydrolases. Sodotheauthorsnoteanythingspecialinthecaseofthebacterial representativesthatmightsupportsuchafunctionalshift? Thishypothesisisderivedfromphylogeneticdistributionanditisnot unprecedentedthatligasesandhydrolasesarefoundinthesamefamily(see exampleinPMID:12359880).However,weagreethatthishypothesisderives mainlyfromphylogeneticpatternsanalysisandbeyondthedifferencesinthe predictedsubstratebindingpocketfoundintheDUF71-B12familywedid notidentifyspecifychangesthatcouldpointtoashifttohydrolase,hence ourcautioninourpredictionasstatedinthetext. QualityofwrittenEnglish:Acceptable Acknowledgements ThisworkwassupportedbytheUSNationalScienceFoundation(grant MCB-1153413toV.dC-L),theUSNationalInstitutesofHealth(grant U54GM094597toG.T.MontelioneandtheNortheastStructuralGenomics Consortium)andtheAgenceNationalepourlaRecherche (grantANR-10-BINF-01-0127Ancestrome)toC.B-A.WethankRaffael SchaffrathandMikeStarkforsharingforsharingunpublisheddiphthamide relateddataandcriticalevaluationofmanuscriptparts.WethankforJorge Escalante-SemerenaforsharinghisimmenseknowledgeonB12salvage pathways,DianaDownsfordisclosingunpublishedresultsonRidAfunction, ManalSwairjoforchemicalinsight,andAndrewHansonforhelpfulinputon themanuscript. Authordetails1DepartmentofMicrobiologyandCellScience,UniversityofFlorida, Gainesville,FL32611,USA.2DepartmentofBiologicalSciences,Columbia University,NortheastStructuralGenomicsConsortium,1212AmsterdamAve, NewYork,NY10027,USA.3UniversitdeLyon;UniversitLyon1;CNRS; UMR5558,LaboratoiredeBiomtrieetBiologieEvolutive,43boulevarddu11 Novembre1918,Lyon,VilleurbanneF-69622,France. Received:17July2012Accepted:18September2012 Published:26September2012 References1.GreganovaE,AltmannM,BtikoferP: Uniquemodificationsoftranslation elongationfactors. FEBSJ 2011, 278 (15):2613 2624. 2.VanNessBG,HowardJB,BodleyJW: ADP-ribosylationofelongationfactor 2bydiphtheriatoxin.Isolationandpropertiesofthenovel ribosyl-aminoacidanditshydrolysisproducts. JBiolChem 1980, 255 (22):10717 10720. 3.VanNessBG,HowardJB,BodleyJW: ADP-ribosylationofelongationfactor 2bydiphtheriatoxin.NMRspectraandproposedstructuresof ribosyl-diphthamideanditshydrolysisproducts. JBiolChem 1980, 255 (22):10710 10716. 4.ZhangY,ZhuX,TorelliAT,LeeM,DzikovskiB,KoralewskiRM,WangE, FreedJ,KrebsC,EalickSE, etal : Diphthamidebiosynthesisrequiresan organicradicalgeneratedbyaniron sulphurenzyme. Nature 2010, 465 (7300):891 896. 5.ZhuX,DzikovskiB,SuX,TorelliAT,ZhangY,EalickSE,FreedJH,LinH: Mechanisticunderstandingof Pyrococcushorikoshii Dph2,a[4Fe-4S] enzymerequiredfordiphthamidebiosynthesis. MolBiosyst 2011, 7 (1):74 81. 6.LartillotN,LepageT,BlanquartS: PhyloBayes3:aBayesiansoftware packageforphylogeneticreconstructionandmoleculardating. Bioinformatics 2009, 25 (17):2286 2288. 7.McReeDE: XtalView/Xfit Aversatileprogramformanipulatingatomic coordinatesandelectrondensity. JStructBiol 1999, 125 (2 3):156 165. 8.KatohK,MisawaK,KumaK-I,MiyataT: MAFFT:anovelmethodforrapid multiplesequencealignmentbasedonfastFouriertransform. Nucleic AcidsRes 2002, 30 (14):3059 3066. 9.WebbTR,CrossSH,McKieL,EdgarR,VizorL,HarrisonJ,PetersJ,JacksonIJ: DiphthamidemodificationofeEF2requiresaJ-domainproteinandis essentialfornormaldevelopment. JCellSci 2008, 121 (19):3140 3145. 10.ZhuX,KimJ,SuX,LinH: Reconstitutionofdiphthinesynthaseactivity invitro. Biochemistry 2010, 49 (44):9649 9657.11.MattheakisLC,ShenWH,CollierRJ: DPH5,amethyltransferasegene requiredfordiphthamidebiosynthesisin Saccharomycescerevisiae Mol CellBiol 1992, 12 (9):4026 4037. 12.MoehringTJ,DanleyDE,MoehringJM: Invitrobiosynthesisof diphthamide,studiedwithmutantChinesehamsterovarycellsresistant todiphtheriatoxin. MolCellBiol 1984, 4 (4):642 650. 13.SuX,ChenW,LeeW,JiangH,ZhangS,LinH: YBR246Wisrequiredforthe thirdstepofdiphthamidebiosynthesis. JAmChemSoc 2011, 134 (2):773 776. 14.HunterS,JonesP,MitchellA,ApweilerR,AttwoodTK,BatemanA,Bernard T,BinnsD,BorkP,BurgeS, etal : InterProin2011:newdevelopmentsin thefamilyanddomainpredictiondatabase. NucleicAcidsRes 2012, 40 (D1):D306 D312. 15.FinnRD,MistryJ,TateJ,CoggillP,HegerA,PollingtonJE,GavinOL, GunasekaranP,CericG,ForslundK, etal : ThePfamproteinfamilies database. NucleicAcidsRes 2010, 38 (suppl_1):D211 D222. 16.TatusovR,FedorovaN,JacksonJ,JacobsA,KiryutinB,KooninE,KrylovD, MazumderR,MekhedovS,NikolskayaA, etal : TheCOGdatabase:an updatedversionincludeseukaryotes. BMCBioinforma 2003, 4 (1):41. 17.AltschulSF,MaddenTL,SchafferAA,ZhangJ,ZhangZ,MillerW,LipmanDJ: GappedBLASTandPSI-BLAST:anewgenerationofproteindatabase searchprograms. NucleicAcidsRes 1997, 25 (17):3389 3402. 18.LarkinMA,BlackshieldsG,BrownNP,ChennaR,McGettiganPA,McWilliam H,ValentinF,WallaceIM,WilmA,LopezR, etal : ClustalWandClustalX version2.0. Bioinformatics 2007, 23 (21):2947 2948. 19.CorpetF: Multiplesequencealignmentwithhierarchicalclustering. NucleicAcidsRes 1988, 16 (22):10881 10890. 20.OverbeekR,BegleyT,ButlerRM,ChoudhuriJV,ChuangHY,CohoonM,de Crcy-LagardV,DiazN,DiszT,EdwardsR, etal : Thesubsystemsapproach togenomeannotationanditsuseintheprojecttoannotate1000 genomes. NucleicAcidsRes 2005,33 (17):5691 5702. 21.MarkowitzVM,ChenI-MA,PalaniappanK,ChuK,SzetoE,GrechkinY,Ratner A,AndersonI,LykidisA,MavromatisK, etal : Theintegratedmicrobial genomessystem:anexpandingcomparativeanalysisresource. Nucleic AcidsRes 2010, 38 (suppl1):D382 D390. 22.AlmEJ,HuangKH,PriceMN,KocheRP,KellerK,DubchakIL,ArkinAP: The MicrobesOnlinewebsiteforcomparativegenomics. GenomeRes 2005, 15 (7):1015 1022. 23.HibbsMA,HessDC,MyersCL,HuttenhowerC,LiK,TroyanskayaOG: Exploringthefunctionallandscapeofgeneexpression:directedsearch oflargemicroarraycompendia. Bioinformatics 2007, 23 (20):2692 2699. 24.CherryJM,HongEL,AmundsenC,BalakrishnanR,BinkleyG,ChanET, ChristieKR,CostanzoMC,DwightSS,EngelSR, etal : Saccharomyces genomedatabase:thegenomicsresourceofbuddingyeast. NucleicAcids Res 2012, 40 (D1):D700 D705. 25.HillenmeyerM,EricsonE,DavisR,NislowC,KollerD,GiaeverG: Systematic analysisofgenome-widefitnessdatainyeastrevealsnovelgene functionanddrugaction. GenomeBiol 2010, 11 (3):R30. 26.HillenmeyerME,FungE,WildenhainJ,PierceSE,HoonS,LeeW,ProctorM, St.OngeRP,TyersM,KollerD, etal : Thechemicalgenomicportraitof yeast:uncoveringaphenotypeforallgenes. Science 2008, 320 (5874):362 365. 27.LetunicI,BorkP: InteractiveTreeOfLife(iTOL):anonlinetoolforphylogenetic treedisplayandannotation. Bioinformatics 2007, 23 (1):127 128. 28.CrooksGE,HonG,ChandoniaJ-M,BrennerSE: WebLogo:asequencelogo generator. GenomeRes 2004, 14 (6):1188 1190.deCrcy-Lagard etal.BiologyDirect 2012, 7 :32 Page12of13 http://www.biology-direct.com/content/7/1/32 PAGE 13 29.PruittKD,TatusovaT,BrownGR,MaglottDR: NCBIReferencesequences (RefSeq):currentstatus,newfeaturesandgenomeannotationpolicy. Nucleic AcidsRes 2012, 40 (D1):D130 D135. 30.PhilippeH: MUST,acomputerpackageofmanagementutilitiesfor sequencesandtrees. NucleicAcidsRes 1993, 21 (22):5264 5272. 31.GouyM,GuindonS,GascuelO: SeaViewVersion4:amultiplatformgraphical userinterfaceforsequencealignm entandphylogenetictreebuilding. Mol BiolEvol 2010, 27 (2):221 224. 32.AravindL,AnantharamanV,KooninEV: MonophylyofclassIaminoacyltRNA synthetase,USPA,ETFP,photolyase,andPP-ATPasenucleotide-binding domains:implicationsforproteinevolutionintheRNAworld. Proteins: Structure,FunctBioinform 2002, 48 (1):1 14. 33.ForouharF,SaadatN,HussainM,Seethara manJ,LeeI,JanjuaH,XiaoR,ShastryR, ActonTB,MontelioneGT, etal : Alargeconformationalchangeintheputative ATPpyrophosphatasePF0828inducedbyATPbinding. ActaCrystallogrSectF StructBiolCrystCommun 2011, 67 (11):1323 1327. 34.BotetJ,Rodriguez-MateosM,BallestaJPG,RevueltaJL,RemachaM: A chemicalgenomicscreenin Saccharomycescerevisiae revealsarolefor diphthamidationoftranslationElongationFactor2ininhibitionof proteinsynthesisbySordarin. AntimicrobAgentsChemother 2008, 52 (5):1623 1629. 35.BrC,ZabelR,LiuS,StarkMJR,SchaffrathR: Aversatilepartnerof eukaryoticproteincomplexesthatisinvolvedinmultiplebiological processes:Kti11/Dph3. MolMicrobiol 2008, 69 (5):1221 1233. 36.JorgensenR,WangY,VisschedykD,MerrillAR: Thenatureandcharacterof thetransitionstatefortheADP-ribosyltransferasereaction. EMBORep 2008, 9 (8):802 809. 37.IyerLM,AbhimanS,MaxwellBurroughsA,AravindL: Amidoligaseswith ATP-grasp,glutaminesynthetase-likeandacetyltransferase-likedomains: synthesisofnovelmetabolitesandpeptidemodificationsofproteins. MolBioSys 2009, 5 (12):1636 1660. 38.LambrechtJA,FlynnJM,DownsDM: ConservedYjgFproteinfamily deaminatesreactiveenamine/imineintermediatesofPyridoxal5 Phosphate(PLP)-dependentenzymereactions.JBiolChem 2012, 287 (5):3454 3461. 39.Brochier-ArmanetC,ForterreP,GribaldoS: Phylogenyandevolutionofthe Archaea:onehundredgenomeslater. CurrOpinMicrobiol 2011, 14 (3):274 281. 40.BorthsEL,PoolmanB,HvorupRN,LocherKP,ReesDC: Invitrofunctional characterizationofBtuCD-F,the Escherichiacoli ABCtransporterfor vitaminB12uptake. Biochemistry 2005, 44 (49):16301 16309. 41.ZayasCL,ClaasK,Escalante-SemerenaJC: TheCbiBproteinof Salmonella enterica isanintegralmembraneproteininvolvedinthelaststepofthe denovocorrinringbiosyntheticpathway. JBacteriol 2007, 189 (21):7697 7708. 42.Escalante-SemerenaJC: Conversionofcobinamideinto adenosylcobamideinbacteriaandarchaea. JBacteriol 2007, 189 (13):4555 4560. 43.WoodsonJD,Escalante-SemerenaJC: CbiZ,anamidohydrolaseenzyme requiredforsalvagingthecoenzymeB12precursorcobinamidein archaea. ProcNatlAcadSciUSA 2004, 101 (10):3591 3596. 44.GrayMJ,Escalante-SemerenaJC: Thecobinamideamidohydrolase(cobyric acid-forming)CbiZenzyme:acriticalactivityofthecobamide remodellingsystemof Rhodobactersphaeroides MolMicrobiol 2009, 74 (5):1198 1210.doi:10.1186/1745-6150-7-32 Citethisarticleas: deCrcy-Lagard etal. : Comparativegenomicanalysis oftheDUF71/COG2102familypredictsrolesindiphthamide biosynthesisandB12salvage. BiologyDirect 2012 7 :32. Submit your next manuscript to BioMed Central and take full advantage of: Convenient online submission Thorough peer review No space constraints or color gure charges Immediate publication on acceptance Inclusion in PubMed, CAS, Scopus and Google Scholar Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit deCrcy-Lagard etal.BiologyDirect 2012, 7 :32 Page13of13 http://www.biology-direct.com/content/7/1/32 xml version 1.0 encoding utf-8 standalone no mets ID sort-mets_mets OBJID sword-mets LABEL DSpace SWORD Item PROFILE METS SIP Profile xmlns http:www.loc.govMETS xmlns:xlink http:www.w3.org1999xlink xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmetsmets.xsd metsHdr CREATEDATE 2013-01-09T12:07:58 agent ROLE CUSTODIAN TYPE ORGANIZATION name BioMed Central dmdSec sword-mets-dmd-1 GROUPID sword-mets-dmd-1_group-1 mdWrap SWAP Metadata MDTYPE OTHER OTHERMDTYPE EPDCX MIMETYPE textxml xmlData epdcx:descriptionSet xmlns:epdcx http:purl.orgeprintepdcx2006-11-16 xmlns:MIOJAVI http:purl.orgeprintepdcxxsd2006-11-16epdcx.xsd epdcx:description epdcx:resourceId sword-mets-epdcx-1 epdcx:statement epdcx:propertyURI http:purl.orgdcelements1.1type epdcx:valueURI http:purl.orgeprintentityTypeScholarlyWork http:purl.orgdcelements1.1title epdcx:valueString Comparative genomic analysis of the DUF71/COG2102 family predicts roles in diphthamide biosynthesis and B12 salvage http:purl.orgdctermsabstract Abstract Background The availability of over 3000 published genome sequences has enabled the use of comparative genomic approaches to drive the biological function discovery process. Classically, one used to link gene with function by genetic or biochemical approaches, a lengthy process that often took years. Phylogenetic distribution profiles, physical clustering, gene fusion, co-expression profiles, structural information and other genomic or post-genomic derived associations can be now used to make very strong functional hypotheses. Here, we illustrate this shift with the analysis of the DUF71/COG2102 family, a subgroup of the PP-loop ATPase family. Results The DUF71 family contains at least two subfamilies, one of which was predicted to be the missing diphthine-ammonia ligase (EC 6.3.1.14), Dph6. This enzyme catalyzes the last ATP-dependent step in the synthesis of diphthamide, a complex modification of Elongation Factor 2 that can be ADP-ribosylated by bacterial toxins. Dph6 orthologs are found in nearly all sequenced Archaea and Eucarya, as expected from the distribution of the diphthamide modification. The DUF71 family appears to have originated in the Archaea/Eucarya ancestor and to have been subsequently horizontally transferred to Bacteria. Bacterial DUF71 members likely acquired a different function because the diphthamide modification is absent in this Domain of Life. In-depth investigations suggest that some archaeal and bacterial DUF71 proteins participate in B12 salvage. Conclusions This detailed analysis of the DUF71 family members provides an example of the power of integrated data-miming for solving important “missing genes” or “missing function” cases and illustrates the danger of functional annotation of protein families by homology alone. Reviewers’ names This article was reviewed by Arcady Mushegian, Michael Galperin and L. Aravind. http:purl.orgdcelements1.1creator de Crécy-Lagard, Valérie Forouhar, Farhad Brochier-Armanet, Céline Tong, Liang Hunt, John F http:purl.orgeprinttermsisExpressedAs epdcx:valueRef sword-mets-expr-1 http:purl.orgeprintentityTypeExpression http:purl.orgdcelements1.1language epdcx:vesURI http:purl.orgdctermsRFC3066 en http:purl.orgeprinttermsType http:purl.orgeprinttypeJournalArticle http:purl.orgdctermsavailable epdcx:sesURI http:purl.orgdctermsW3CDTF 2012-09-26 http:purl.orgdcelements1.1publisher BioMed Central Ltd http:purl.orgeprinttermsstatus http:purl.orgeprinttermsStatus http:purl.orgeprintstatusPeerReviewed http:purl.orgeprinttermscopyrightHolder Valérie de Crécy-Lagard et al.; licensee BioMed Central Ltd. http:purl.orgdctermslicense http://creativecommons.org/licenses/by/2.0 http:purl.orgdctermsaccessRights http:purl.orgeprinttermsAccessRights http:purl.orgeprintaccessRightsOpenAccess http:purl.orgeprinttermsbibliographicCitation Biology Direct. 2012 Sep 26;7(1):32 http:purl.orgdcelements1.1identifier http:purl.orgdctermsURI http://dx.doi.org/10.1186/1745-6150-7-32 fileSec fileGrp sword-mets-fgrp-1 USE CONTENT file sword-mets-fgid-0 sword-mets-file-1 FLocat LOCTYPE URL xlink:href 1745-6150-7-32.xml sword-mets-fgid-1 sword-mets-file-2 applicationpdf 1745-6150-7-32.pdf sword-mets-fgid-3 sword-mets-file-3 applicationvnd.openxmlformats-officedocument.spreadsheetml.sheet 1745-6150-7-32-S2.XLSX sword-mets-fgid-4 sword-mets-file-4 1745-6150-7-32-S1.XLSX sword-mets-fgid-5 sword-mets-file-5 1745-6150-7-32-S3.PDF structMap sword-mets-struct-1 structure LOGICAL div sword-mets-div-1 DMDID Object sword-mets-div-2 File fptr FILEID sword-mets-div-3 sword-mets-div-4 sword-mets-div-5 sword-mets-div-6 |