Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Dec 14:7:46.
doi: 10.1186/1745-6150-7-46.

Updated clusters of orthologous genes for Archaea: a complex ancestor of the Archaea and the byways of horizontal gene transfer

Affiliations

Updated clusters of orthologous genes for Archaea: a complex ancestor of the Archaea and the byways of horizontal gene transfer

Yuri I Wolf et al. Biol Direct. .

Abstract

Background: Collections of Clusters of Orthologous Genes (COGs) provide indispensable tools for comparative genomic analysis, evolutionary reconstruction and functional annotation of new genomes. Initially, COGs were made for all complete genomes of cellular life forms that were available at the time. However, with the accumulation of thousands of complete genomes, construction of a comprehensive COG set has become extremely computationally demanding and prone to error propagation, necessitating the switch to taxon-specific COG collections. Previously, we reported the collection of COGs for 41 genomes of Archaea (arCOGs). Here we present a major update of the arCOGs and describe evolutionary reconstructions to reveal general trends in the evolution of Archaea.

Results: The updated version of the arCOG database incorporates 91% of the pangenome of 120 archaea (251,032 protein-coding genes altogether) into 10,335 arCOGs. Using this new set of arCOGs, we performed maximum likelihood reconstruction of the genome content of archaeal ancestral forms and gene gain and loss events in archaeal evolution. This reconstruction shows that the last Common Ancestor of the extant Archaea was an organism of greater complexity than most of the extant archaea, probably with over 2,500 protein-coding genes. The subsequent evolution of almost all archaeal lineages was apparently dominated by gene loss resulting in genome streamlining. Overall, in the evolution of Archaea as well as a representative set of bacteria that was similarly analyzed for comparison, gene losses are estimated to outnumber gene gains at least 4 to 1. Analysis of specific patterns of gene gain in Archaea shows that, although some groups, in particular Halobacteria, acquire substantially more genes than others, on the whole, gene exchange between major groups of Archaea appears to be largely random, with no major 'highways' of horizontal gene transfer.

Conclusions: The updated collection of arCOGs is expected to become a key resource for comparative genomics, evolutionary reconstruction and functional annotation of new archaeal genomes. Given that, in spite of the major increase in the number of genomes, the conserved core of archaeal genes appears to be stabilizing, the major evolutionary trends revealed here have a chance to stand the test of time.

Reviewers: This article was reviewed by (for complete reviews see the Reviewers' Reports section): Dr. PLG, Prof. PF, Dr. PL (nominated by Prof. JPG).

PubMed Disclaimer

Figures

Figure 1
Figure 1
A commonality plot for Archaeal protein-coding genes. Diamonds show the number of arCOGs that include the given number of distinct genomes. Dashed red lines, decomposition of the data into three exponents (“cloud”, “shell” and “core” [37,38]); solid red line: the sum of the three components.
Figure 2
Figure 2
Phylogeny of universal Archaeal ribosomal proteins. The approximate Maximum Likelihood tree was reconstructed using FastTree [51,52].
Figure 3
Figure 3
Phyletic patterns and inferred gene gain patterns in Archaea. The most frequent phyletic patterns are shown to the right of the tree as blocks of genomes where the gene is present. The inferred number of gene gains is indicated for the tree branches. The most frequent gain patterns are shown as green dots associated with tree branches. The pattern within the Methanosarcinales clade refers to the Methanosarcina genus. The fuzzy pattern to the left of the tree root refers to the “zero-gain” inference without a confident assignment to any particular clade.
Figure 4
Figure 4
Inferred ancestral genomes in Archaea. The square boxes at the bases of clades indicate the number of families in the inferred ancestral genomes; the rectangles at the tips of clades indicate the number of families in the extant genomes within the clade. Square boxes at the tips indicate single genomes; rectangular boxes indicate multiple genomes.
Figure 5
Figure 5
Distribution of the gain patterns by the number of gains. a. Number of confidently predicted gains in a pattern (p > 0.5). b. The sum of posterior gain probabilities in a pattern (kernel-smoothed probability density). Dotted line, the best-fitting exponent.
Figure 6
Figure 6
The byways of horizontal gene transfer among Archaea. Lines connect the clades that form the most frequent phyletic patterns with two inferred gains (different colors are used for visual differentiation only). One of the two clades is the likely origin of the respective arCOG and the other is the likely acceptor of the HGT.

Similar articles

Cited by

References

    1. Kristensen DM, Wolf YI, Mushegian AR, Koonin EV. Computational methods for gene orthology inference. Brief Bioinform. 2011;12(5):379–391. - PMC - PubMed
    1. Kuzniar A, van Ham RC, Pongor S, Leunissen JA. The quest for orthologs: finding the corresponding gene across genomes. Trends Genet. 2008;24(11):539–551. - PubMed
    1. Koonin EV. Orthologs, paralogs, and evolutionary genomics. Annu Rev Genet. 2005;39:309–338. - PubMed
    1. Lynch M, Katju V. The altered evolutionary trajectories of gene duplicates. Trends Genet. 2004;20(11):544–549. - PubMed
    1. Ohno S. Evolution by gene duplication. Berlin-Heidelberg-New York: Springer-Verlag; 1970.

Publication types

Substances

LinkOut - more resources