NIH’s COVID-focused Sequence Read Archive (SRA) datasets are now open access on AWS!

While searching for SARS-CoV-2 sequences, have you longed for a COVID-focused SRA dataset? Great news — now there is one! We are happy to announce the addition of COVID-focused datasets (including source and normalized SRA file formats) to the AWS Public Dataset Program. These data can now be explored at the Registry of Open Data on AWS.

Researchers can now access more than 13K SRA runs that include Coronaviridae (CoV) content identified by a kmer-based approach to organismal content identification using the SRA Taxonomy Analysis Tool.

Rapid and reliable access to COVID-19 data is paramount to support research and management of the SARS-CoV-2 outbreak. By including this dataset in the AWS Public Dataset Program, researchers can access and egress critical datasets at no cost, helping researchers to get straight to the science. The data are publicly accessible natively from S3, for researchers to download and analyze locally, or compute upon directly in the cloud.

Work is currently underway to host this dataset on additional public data cloud platforms. Stay tuned!

screenshot of COVID-19 genome sequence dataset on Amazon Web Services registry of open data — **Figure 1.** NCBI’s COVID-19 Genome Sequence Dataset on Registry of Open Data on AWS.

In case your research interests extend beyond Coronaviridae (CoV), you can explore the entire SRA dataset, hosted by the NCBI at the NLM, and on GCP and AWS as part of the NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative.

Getting started is easy! Refer to detailed guidance here and watch our how-to videos to get started on AWS on YouTube. Write to us at sra@ncbi.nlm.nih.gov to let us know what you think and how we can serve your needs better!

What's New

2 thoughts on “NIH’s COVID-focused Sequence Read Archive (SRA) datasets are now open access on AWS!”

Pingback: Taking COVID in STRIDES: The National Center for Biotechnology Information makes coronavirus genomic data available on AWS | AWS Public Sector Blog
Robert Youngblood says:

September 1, 2020 at 1:00 pm

Looking Foward to reading the Gnome atricles regarding Covid-19.
R. Youngblood

Loading...

NCBI Insights

NIH’s COVID-focused Sequence Read Archive (SRA) datasets are now open access on AWS!

Like this:

2 thoughts on “NIH’s COVID-focused Sequence Read Archive (SRA) datasets are now open access on AWS!”

Leave a ReplyCancel reply

Share this post:

Like this:

2 thoughts on “NIH’s COVID-focused Sequence Read Archive (SRA) datasets are now open access on AWS!”

Leave a ReplyCancel reply

Discover more from NCBI Insights