U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

NIH NLM Logo
Log in

Account

Logged in as:
username
  • Dashboard
  • Publications
  • Account settings
  • Log out
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
NCBI Datasets
  • NCBI Datasets
  • Taxonomy
  • Genome
  • Gene
  • Command-line tools
  • Documentation
  • Cite Us
  • Documentation
    • Getting started
    • Command line tools
      • Download and install
      • Using dataformat
        • Gene data reports
        • Genome data reports
        • Virus data reports
    • How-To Guides
      • Genes
        • Get gene metadata
        • Download genes
        • Download orthologs
      • Genomes
        • Get genome metadata
        • Download genome data
        • Large genome downloads
      • Taxonomy
        • Get taxonomy metadata
      • Virus
        • Get Alphainfluenza virus metadata
        • Get virus metadata
        • Download virus genomes
        • Download Alphainfluenza genomes
        • Download SARS-CoV-2 genomes
        • Download SARS-CoV-2 proteins
      • File validation
    • Tutorials
      • Get one protein from an ortholog set
      • Rename downloaded files
      • Retrieve ortholog data and metadata
      • Working with JSON Lines data reports
    • API
      • REST API
      • API keys
      • Programming Languages
      • Retired Endpoints
    • NCBI Genome Data Processing
      • Genome Data Processing
        • Assembly version and status
        • Reference Genomes
        • RefSeq Genomes
        • Genome Notes
        • Type Material
      • Genome Annotation Pipelines
      • Genome Quality Analysis
        • Contamination Screening
          • FCS
          • ANI
        • ANI Overview
        • BUSCO
        • CheckM
      • Data model
      • Genomes FTP
      • Ortholog Calculation
    • FAQs and troubleshooting
      • Frequently asked Questions
      • Mac zip bugs
    • Reference
      • Command line
        • dataformat
          • tsv
            • genome
            • genome-seq
            • gene
            • gene-product
            • virus-genome
            • virus-annotation
            • microbigge
            • prok-gene
            • prok-gene-location
            • genome-annotations
            • taxonomy
            • organelle
          • excel
            • genome
            • genome-seq
            • gene
            • gene-product
            • virus-genome
            • virus-annotation
            • microbigge
            • prok-gene
            • prok-gene-location
            • genome-annotations
            • taxonomy
            • organelle
          • catalog
          • completion
            • bash
            • zsh
            • fish
            • powershell
          • version
        • datasets
          • summary
            • gene
              • gene-id
              • symbol
              • accession
              • taxon
              • locus-tag
            • genome
              • accession
              • taxon
            • virus
              • genome
                • taxon
                • accession
            • taxonomy
              • taxon
          • download
            • gene
              • gene-id
              • symbol
              • accession
              • taxon
              • locus-tag
            • genome
              • accession
              • taxon
            • taxonomy
              • taxon
            • virus
              • genome
                • accession
                • taxon
              • protein
          • rehydrate
          • completion
            • bash
            • zsh
            • fish
            • powershell
      • File formats
        • Annotation files
          • GBFF
          • GFF3
        • Metadata files
          • Summary of metadata file formats
          • Why JSON and JSON Lines
          • Tools for JSON and JSON Lines
      • Release notes
      • Report schemas
        • Gene
        • Gene product
        • Genome assembly
        • Genome sequence
        • MicroBIGG-E
        • Prok. gene
        • Prok. gene location
        • Taxonomy
        • Taxonomy names
        • Virus
        • Virus annotation
      • Data packages
        • Gene package
        • Genome package
        • Taxonomy package
        • Virus data package
    • Glossary
    • Cite Us
Documentation version
Learn more
  1. Documentation
  2. NCBI Genome Data Processing

NCBI Genome Data Processing

Information on NCBI genome data processing, annotation pipelines and data models.

NCBI Genome Data Processing
  • Assembly Versioning and Status
  • Selecting Reference Genomes
  • Genomes Selected for RefSeq Annotation
  • Genome Notes
  • What is Type Material?
NCBI RefSeq Genome Annotation Pipelines
  • NCBI Eukaryotic Annotation Pipeline (EGAP)
  • NCBI Prokaryotic Annotation Pipeline (PGAP)
NCBI Genome Quality Analysis
  • NCBI Contamination Screening
  • Average Nucleotide Identity (ANI)
  • Benchmarking Universal Single-Copy Orthologs (BUSCO) Analysis
  • CheckM Analysis
The NCBI Genome Assembly Data Model
    NCBI Genomes FTP
      How Are Orthologs Calculated?
        Generated February 26, 2025
        Follow NCBI
        TwitterFacebookLinkedInGitHub

        Connect with NLM

        • Twitter
        • SM-Facebook
        • SM-Youtube

        National Library of Medicine
        8600 Rockville Pike
        Bethesda, MD 20894

        Web Policies
        FOIA
        HHS Vulnerability Disclosure

        Help
        Accessibility
        Careers

        • NLM
        • NIH
        • HHS
        • USA.gov