RefSeq Release 227 is Available!

RefSeq Release 227 is Available!

Check out RefSeq release 227, now available online and from the FTP site. You can access RefSeq data through NCBI Datasets. The release is provided in several directories as a complete dataset and also as divided by logical groupings.

What’s included in this release?

As of November 4, 2024, this full release incorporates genomic, transcript, and protein data containing:

  • 497,549,107 records, including
  • 377,783,847 proteins
  • 66,987,567 RNAs
  • Sequences from 159,324 organisms 

Lineage updates for prokaryotes

Lineage updates affecting all prokaryotes are in progress. This affects the lineage shown in the ORGANISM block of the flatfile records.

Genome contamination reports

Comprehensive reports of contamination found in prokaryote and eukaryote genomes are now available for both GenBank and RefSeq genomes. This information can be used to better understand possible issues that may affect your studies, and to select against contaminated assemblies or remove contaminant sequences.

New MANE release

The MANE v1.4 dataset is now available, including the addition of 50 non-coding genes associated with human diseases.

New eukaryotic genome annotations

This release contains new or updated annotations generated by NCBI’s eukaryotic genome annotation pipeline for 41 species, including:

Stay up to date

RefSeq is part of the NIH Comparative Genomics Resource (CGR)CGR facilitates reliable comparative genomics analyses for all eukaryotic organisms through an NCBI Toolkit and community collaboration. Follow us on social @NCBI and join our mailing list to keep up to date with RefSeq and other CGR news.

Questions?

If you have questions or would like to provide feedback, please reach out to us! 

Leave a Reply