Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jul 31:10:38.
doi: 10.1186/s13062-015-0069-2.

De-DUFing the DUFs: Deciphering distant evolutionary relationships of Domains of Unknown Function using sensitive homology detection methods

Affiliations

De-DUFing the DUFs: Deciphering distant evolutionary relationships of Domains of Unknown Function using sensitive homology detection methods

Richa Mudgal et al. Biol Direct. .

Abstract

Background: In the post-genomic era where sequences are being determined at a rapid rate, we are highly reliant on computational methods for their tentative biochemical characterization. The Pfam database currently contains 3,786 families corresponding to "Domains of Unknown Function" (DUF) or "Uncharacterized Protein Family" (UPF), of which 3,087 families have no reported three-dimensional structure, constituting almost one-fourth of the known protein families in search for both structure and function.

Results: We applied a 'computational structural genomics' approach using five state-of-the-art remote similarity detection methods to detect the relationship between uncharacterized DUFs and domain families of known structures. The association with a structural domain family could serve as a start point in elucidating the function of a DUF. Amongst these five methods, searches in SCOP-NrichD database have been applied for the first time. Predictions were classified into high, medium and low- confidence based on the consensus of results from various approaches and also annotated with enzyme and Gene ontology terms. 614 uncharacterized DUFs could be associated with a known structural domain, of which high confidence predictions, involving at least four methods, were made for 54 families. These structure-function relationships for the 614 DUF families can be accessed on-line at http://proline.biochem.iisc.ernet.in/RHD_DUFS/ . For potential enzymes in this set, we assessed their compatibility with the associated fold and performed detailed structural and functional annotation by examining alignments and extent of conservation of functional residues. Detailed discussion is provided for interesting assignments for DUF3050, DUF1636, DUF1572, DUF2092 and DUF659.

Conclusions: This study provides insights into the structure and potential function for nearly 20 % of the DUFs. Use of different computational approaches enables us to reliably recognize distant relationships, especially when they converge to a common assignment because the methods are often complementary. We observe that while pointers to the structural domain can offer the right clues to the function of a protein, recognition of its precise functional role is still 'non-trivial' with many DUF domains conserving only some of the critical residues. It is not clear whether these are functional vestiges or instances involving alternate substrates and interacting partners.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
SCOP-superfamily assignments by each method: Venn diagram illustrating the number of remote similarity detections that are unique and common to the five methods in the study
Fig. 2
Fig. 2
Venn diagrams representing distribution of families in different biological kingdoms. a The distribution of all Pfam families b the distribution of 3,087 DUFs with no structural or functional information and c the distribution of 614 DUFs with SCOP superfamily assignments, in different kingdoms
Fig. 3
Fig. 3
Distribution of combined remote similarity detection across different SCOP classes: a SCOP Class distribution of 628 domains recognized as related to 614 DUF families for each prediction method. b Bar plot representation of the frequency distribution of all SCOP superfamilies represented in structural domain recognition. Representative structures of top 10 superfamilies are shown around the radial frequency plot of these superfamilies
Fig. 4
Fig. 4
Modeled structure of DUF3050 using 1RCW as template. The cartoon representation of the structure is coloured based on sequence conservation (from blue to white to red, where blue indicates poorly conserved residues and red indicates highly conserved residues). The metal-coordinating active site residues in the di-iron sites, Glu 93, His 105, Glu 177, His 207, Asp 211, and His 215 and other residue in the active site (Tyr 203) are depicted in ball-and-stick format
Fig. 5
Fig. 5
DUF1572 – a putative metalloenzyme: a Structural alignment of YfiT from Bacillus subtilis (PDB ID: 1RXQ, shown in wheat color) and modeled DUF1572 (light blue) highlighting the active site region. Conserved histidine residues coordinating with the Zn metal ion in both structures are shown in ball-and-stick. b Multiple sequence alignment of representative sequences of DUF1572 with the sequence of 1RXQ, highlighting the conserved histidine residues by red stars. A blue circle denotes the additionally conserved active site Aspartate residue
Fig. 6
Fig. 6
A multiple sequence alignment of the DUF2071 family with acetoacetate decarboxylase. Alignment of representative members of the family with the structural templates (3BGT:A, 3C8W:A). Hydrophobic and active site residues are shown (blue circle, red star respectively)
Fig. 7
Fig. 7
DUF2092 – a lipoprotein localization factor, LolA: a Multiple sequence alignment of representative sequences of DUF2092 and bacterial lipoprotein localization factor. Residues involved in the hydrophobic cavity shown with red triangles. b Modelled structure of a DUF2092 with bacterial lipoprotein localization factor, LolA (PDB ID: 1IWL) as template depicting the prokaryotic lipoprotein and lipoprotein localization factor superfamily. Residues are coloured based on a hydrophobic scale ranging 1.380 to −2.530 denoting the most hydrophobic to least derived from Eisenberg normalized hydrophobicity scale [59]
Fig. 8
Fig. 8
DUF1636 – a Thioredoxin-like fold: a Structural alignment of crystal structure of wild-type thioredoxin-like [2Fe-2S] ferredoxin from Aquifex aeolicus (PDB ID: 1M2A, shown in wheat colour) and modelled DUF1636 (green colour). The conserved cysteine residues in the active site and in the loop region are shown in ball-and-stick. b Multiple sequence alignment of representative sequences of DUF1636 and 1M2A with active site residues marked with blue stars. A red arrow highlights the conserved cysteine residue in the loop region. For clarity, only first 70 residues containing the active site are shown

Similar articles

Cited by

References

    1. Eisenhaber F. A decade after the first full human genome sequencing: when will we understand our own genome? J Bioinform Comput Biol.10(5):1271001. doi:10.1142/S0219720012710011 - PubMed
    1. Jaroszewski L, Li Z, Krishna SS, Bakolitsa C, Wooley J, Deacon AM, et al. Exploration of uncharted regions of the protein universe. PLoS Biol. 2009;7(9) doi: 10.1371/journal.pbio.1000205. - DOI - PMC - PubMed
    1. Sonnhammer EL, Eddy SR, Durbin R. Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins. 1997;28(3):405–20. doi: 10.1002/(SICI)1097-0134(199707)28:3<405::AID-PROT10>3.0.CO;2-L. - DOI - PubMed
    1. Goodacre NF, Gerloff DL, Uetz P. Protein domains of unknown function are essential in bacteria. mBio. 2013;5(1):e00744–13. - PMC - PubMed
    1. Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, Hollich V, Lassmann T, et al. Pfam: clans, web tools and services. Nucleic Acids Res. 2006;34(Database issue):D247–51. doi: 10.1093/nar/gkj149. - DOI - PMC - PubMed

Publication types