Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan 6;53(D1):D356-D363.
doi: 10.1093/nar/gkae983.

COG database update 2024

Affiliations

COG database update 2024

Michael Y Galperin et al. Nucleic Acids Res. .

Abstract

The Clusters of Orthologous Genes (COG) database, originally created in 1997, has been updated to reflect the constantly growing collection of completely sequenced prokaryotic genomes. This update increased the genome coverage from 1309 to 2296 species, including 2103 bacteria and 193 archaea, in most cases, with a single representative genome per genus. This set covers all genera of bacteria and archaea that included organisms with 'complete genomes' as per NCBI databases in November 2023. The number of COGs has been expanded from 4877 to 4981, primarily by including protein families involved in bacterial protein secretion. Accordingly, COG pathways and functional groups now include secretion systems of types II through X, as well as Flp/Tad and type IV pili. These groupings allow straightforward identification and examination of the prokaryotic lineages that encompass-or lack-a particular secretion system. Other developments include improved annotations for the rRNA and tRNA modification proteins, multi-domain signal transduction proteins, and some previously uncharacterized protein families. The new version of COGs is available at https://www.ncbi.nlm.nih.gov/research/COG, as well as on the NCBI FTP site https://ftp.ncbi.nlm.nih.gov/pub/COG/, which also provides archived data from previous COG releases.

PubMed Disclaimer

Figures

Graphical Abstract
Graphical Abstract
Figure 1.
Figure 1.
Distribution of the COGs by the number of represented genomes. (A) Plot of the fraction of the genomes represented in a COG versus the number of COGs containing the given fraction of genomes. (B) Cumulative distribution of the number of genomes in COGs.

Similar articles

References

    1. Tatusov R.L., Koonin E.V., Lipman D.J.. A genomic perspective on protein families. Science. 1997; 278:631–637. - PubMed
    1. Tatusov R.L., Galperin M.Y., Natale D.A., Koonin E.V.. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 2000; 28:33–36. - PMC - PubMed
    1. Tatusov R.L., Natale D.A., Garkavtsev I.V., Tatusova T.A., Shankavaram U.T., Rao B.S., Kiryutin B., Galperin M.Y., Fedorova N.D., Koonin E.V.. The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 2001; 29:22–28. - PMC - PubMed
    1. Galperin M.Y., Makarova K.S., Wolf Y.I., Koonin E.V.. Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res. 2015; 43:D261–D269. - PMC - PubMed
    1. Galperin M.Y., Wolf Y.I., Makarova K.S., Vera Alvarez R., Landsman D., Koonin E.V.. COG database update: focus on microbial diversity, model organisms and widespread pathogens. Nucleic Acids Res. 2021; 49:D274–D281. - PMC - PubMed

Substances

LinkOut - more resources