Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2019 Jul 19;20(4):1063-1070.
doi: 10.1093/bib/bbx117.

Microbial genome analysis: the COG approach

Review

Microbial genome analysis: the COG approach

Michael Y Galperin et al. Brief Bioinform. .

Abstract

For the past 20 years, the Clusters of Orthologous Genes (COG) database had been a popular tool for microbial genome annotation and comparative genomics. Initially created for the purpose of evolutionary classification of protein families, the COG have been used, apart from straightforward functional annotation of sequenced genomes, for such tasks as (i) unification of genome annotation in groups of related organisms; (ii) identification of missing and/or undetected genes in complete microbial genomes; (iii) analysis of genomic neighborhoods, in many cases allowing prediction of novel functional systems; (iv) analysis of metabolic pathways and prediction of alternative forms of enzymes; (v) comparison of organisms by COG functional categories; and (vi) prioritization of targets for structural and functional characterization. Here we review the principles of the COG approach and discuss its key advantages and drawbacks in microbial genome analysis.

Keywords: comparative genomics; enzyme evolution; genome annotation; orthologs; paralogs.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Evolution of the COG system. The numbers in parentheses indicate the number of bacterial, archaeal and eukaryotic genomes, respectively, included in the respective COG release [1–6].
Figure 2
Figure 2
Proteome coverage by the current version of COGs. Archaeal and bacterial phyla and selected classes of Firmicutes and Proteobacteria are listed as in the latest release of the COG database [6]. The orange and blue columns show the fractions of the respective proteomes covered by COGs in each taxonomic group (including R- and S-type COGs that consist of poorly characterized or uncharacterized genes), averaged over the members of that group in the COGs (the respective numbers are shown in parentheses). The ‘Other archaea’ group includes two genomes representing, respectively, Kor- and Nanoarchaeota; the ‘Other bacteria’ group includes members of Deferribacteres, Nitrospirae, Verrucomicrobia and other sparsely sampled phyla, as well as representatives of several candidate phyla. The bright yellow rectangles on top of the archaeal columns indicate the additional coverage of the archaeal proteomes in the latest version of arCOGs [10]. The hatched rectangles indicate the additional coverage of the archaeal and bacterial proteomes in the ATGC-COGs from the latest version of the ATGCs database [64].

Similar articles

Cited by

References

    1. Tatusov RL, Koonin EV, Lipman DJ.. A genomic perspective on protein families. Science 1997;278:631–7. - PubMed
    1. Koonin EV, Tatusov RL, Galperin MY.. Beyond complete genomes: from sequence to structure and function. Curr Opin Struct Biol 1998;8:355–63. - PubMed
    1. Tatusov RL, Galperin MY, Natale DA, et al. . The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res 2000;28:33–6. - PMC - PubMed
    1. Tatusov RL, Natale DA, Garkavtsev IV, et al. . The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res 2001;29:22–8. - PMC - PubMed
    1. Tatusov RL, Fedorova ND, Jackson JD, et al. . The COG database: an updated version includes eukaryotes. BMC Bioinformatics 2003;4:41. - PMC - PubMed

Publication types