Coming soon! Changes to NCBI Datasets command-line tool in version 14 (CLIv14.0.0)

Coming soon! Changes to NCBI Datasets command-line tool in version 14 (CLIv14.0.0)

In October 2022, NCBI Datasets will release version 14 of our datasets and dataformat command-line tools. This release will contain breaking changes to the command syntax, content of the data packages and data reports. Thank you for your feedback that inspired these new features. We hope they will improve your experience!

We will continue to support CLI v13.x, although new features and improvements will be exclusive to CLI v14.0.0 release and up.

NCBI Datasets supports the NIH Comparative Genomics Resource (CGR), an NLM project to establish an ecosystem to facilitate reliable comparative genomics analyses for all eukaryotic organisms. Join our mailing list to keep up to date with NCBI Datasets and other CGR news.

More details

How is version 14 of the Datasets command-line tools (CLI v14.x) different from CLI v13.x and previous versions? 

  • Provides easier access to metadata
  • Contains smaller data packages (faster downloads)
  • Offers expanded content for virus genomes
  • Delivers genome sequences as a single file by default
  • Uses simpler command syntax – data files are now included using the flag:
 --include

Easier access to metadata
All metadata will be printed to the screen, redirected to a file, or piped to the dataformat command-line tool to generate a customized table. Currently, some metadata is only available as part of a downloaded data package. In addition, metadata formats will be standardized across services, and all metadata schemas will be documented.

Smaller data packages
Data packages will include a smaller set of files by default, so downloads will be faster and more reliable. For example, the default genome data package will include only genome sequence and the data report file. Users also will have the option to include other sequence and annotation files, as well as the sequence report file.

Expanded content for virus genomes
All genomes in NCBI Virus will be available through the datasets virus subcommand.

Genome sequences delivered as a single file by default
Genome sequences will be delivered as a single file by default. You will have the option of requesting genome sequences as separate files by chromosome using:

 --chromosomes

Simpler command syntax
We have simplified the way that specific data files and data reports (metadata) are requested. Data files will be specified using a single –include flag instead of multiple exclude flags. For example, to get the genome and protein sequences for the current human reference genome, try:

datasets download genome taxon human --reference --include genome, protein

You also will have the option to add additional data reports to data packages using the flag:

--include

We are excited to launch these new features! Stay tuned for more information.

Leave a Reply