taxon
Request genome data by taxonomic id or name. Allowed taxon are limited to all taxa under Coronaviridae, e.g. sars2 or betacoronavirus
taxon
Name
datasets download virus genome taxon - Request genome data by taxonomic id or name. Allowed taxon are limited to all taxa under Coronaviridae, e.g. sars2 or betacoronavirus
Synopsis
datasets download virus genome taxon <taxon> [flags]
Description
Download a coronavirus genome dataset by taxon (NCBI Taxonomy ID, scientific or common name for any taxonomic group in the coronavirus family). Coronavirus genome data packages include genome, CDS and protein sequence, annotation and a detailed data report. Datasets are downloaded as a zip file.
The default coronavirus genome dataset includes the following files (if available):
- genomic.fna (genomic sequences)
- cds.fna (nucleotide coding sequences)
- protein.faa (protein sequences)
- data_report.jsonl (data report with viral metadata)
- virus_dataset.md (README containing details on sequence file data content and other information)
- dataset_catalog.json (a list of files and file types included in the dataset)
Refer to NCBI’s download and install documentation for information about getting started with the command-line tools.
Examples
datasets download virus genome taxon sars-cov-2 --host dog
datasets download virus genome taxon coronaviridae --host "manis javanica"
Options
--annotated limit to annotated coronavirus genomes
--api-key string NCBI Datasets API Key
--complete-only limit to complete coronavirus genomes
--exclude-cds exclude cds.fna (CDS sequence file)
--exclude-protein exclude protein.faa (protein sequence file)
--exclude-seq exclude genomic.fna (genomic sequence file)
--filename string specify a custom file name for the downloaded dataset (default "ncbi_dataset.zip")
--geo-location string limit to coronavirus genomes isolated from a specified geographic location (continent, country or U.S. state)
-h, --help help for taxon
--host string limit to coronavirus genomes isolated from a specified host (NCBI Taxonomy ID, scientific or common name at any taxonomic rank)
--lineage string limit to SARS-CoV-2 genomes classified as the specified lineage (variant) by pangolin using the pangoLEARN algorithm
--no-progressbar hide progress bar
--refseq limit to RefSeq coronavirus genomes
--released-since string limit to coronavirus genomes released after a specified date (MM/DD/YYYY)
--updated-since string limit to coronavirus genomes updated after a specified date (MM/DD/YYYY)
Generated March 11, 2025