accession
download a genome dataset by NCBI Assembly or BioProject accession
datasets download genome - download a genome dataset
Download a genome dataset including genome, transcript and protein sequence, annotation and a detailed data report. Genome datasets can be specified by NCBI Assembly or BioProject accession or taxon. Datasets are downloaded as a zip file.
The default genome dataset includes the following files (if available):
Refer to NCBI’s download and install documentation for information about getting started with the command-line tools.
datasets download genome accession GCF_000001405.40 --chromosomes X,Y --exclude-gff3 --exclude-rna
datasets download genome taxon "bos taurus" --dehydrated
datasets download genome taxon human --assembly-level chromosome,complete_genome --dehydrated
datasets download genome taxon mouse --search C57BL/6J --search "Broad Institute" --dehydrated
-a, --annotated only include genomes with annotation
--api-key string NCBI Datasets API Key
--assembly-level string restrict assemblies to a comma-separated list of one or more of: chromosome, complete_genome, contig, scaffold
--assembly-source string restrict assemblies to refseq or genbank only
--chromosomes strings limit to a specified, comma-delimited list of chromosomes (default [all])
--dehydrated download a dehydrated zip archive including the data report and locations of data files (use the rehydrate command to retrieve data files).
--exclude-genomic-cds exclude cds_from_genomic.fna (genomic cds file)
--exclude-gff3 exclude genomic.gff (gff3 annotation file)
--exclude-protein exclude protein.faa (protein sequence file)
--exclude-rna exclude rna.fna (transcript sequence file)
--exclude-seq exclude genomic.fna (genomic sequence file)
--filename string specify a custom file name for the downloaded dataset (default "ncbi_dataset.zip")
-h, --help help for genome
--include-gbff include genomic.gbff (GenBank flat file sequence and annotation), if available
--include-gtf include genomic.gtf (gtf annotation file), if available
--no-progressbar hide progress bar
--reference limit to reference and representative (GCF_ and GCA_) assemblies
--released-before string only include genomes that have been released before a specified date (MM/DD/YYYY)
--released-since string only include genomes that have been released after a specified date (MM/DD/YYYY)
--search strings only include genomes that have the specified text in the
searchable fields: species and infraspecies, assembly name and submitter
To provide multiple strings '--search' can be included multiple times
download a genome dataset by NCBI Assembly or BioProject accession
download a genome dataset by taxon (NCBI Taxonomy ID, scientific or common name at any tax rank)