Cloud-based Metadata Table

Overview

Metadata Table (sra.metadata) contains information about the run and biological samples. The biological sample data is stored in two different columns.

Record (array) column that you need to use the command UNNEST to query
JSON string.

Linking to other tables:

to Taxonomy table by organism column
to Taxonomy Analysis Information (tax_analysis_info) table by acc column
to Taxonomy Analysis (tax_analysis) table by acc column

Column name	Type	Desription
acc	STRING	SRA Run accession in the form of SRR######## (ERR or DRR for INSDC partners)
assay_type	STRING	Type of library (i.e. AMPLICON, RNA-Seq, WGS, etc)
center_name	STRING	Name of the sequencing center
consent	STRING	Type of consent need to access the data (i.e. public is available to all, others are for dbGaP)
experiment	STRING	The accession in the form of SRX######## (ERX or DRX for INSDC partners)
sample_name	STRING	Name of the sample
instrument	STRING	Name of the sequencing instrument model
librarylayout	STRING	Whether the data is SINGLE or PAIRED
libraryselection	STRING	Library selection methodology (i.e. PCR, RANDOM, etc)
librarysource	STRING	Source of the biological data (i.e. GENOMIC, METAGENOMIC, etc)
platform	STRING	Name of the sequencing platform (i.e. ILLUMINA)
sample_acc	STRING	SRA Sample accession in the form of SRS######## (ERS or DRS for INSDC partners)
biosample	STRING	BioSample accession in the form of SAMN######## (SAMEA##### or SAMD##### for INSDC partners)
organism	STRING	Scientific name of the organism that was sequenced (as found in the NCBI Taxonomy Browser)
sra_study	STRING	SRA Study accession in the form of SRP######## (ERP or DRP for INSDC partners)
releasedate	TIMESTAMP	The date on which the data was released
bioproject	STRING	BioProject accession in the form of PRJNA######## (PRJEB####### or PRJDB###### for INSDC partners)
mbytes	INTEGER	Number of mega bytes of data in the SRA Run
loaddate	TIMESTAMP	The date when the data was loaded into SRA
avgspotlen	INTEGER	Calculated average read length
mbases	INTEGER	Number of mega bases in the SRA Runs
insertsize	INTEGER	Submitter provided insert size
library_name	STRING	The name of the library
biosamplemodel_sam	STRING	The BioSample package/model that was picked
collection_date_sam	STRING	The collection date of the sample
geo_loc_name_country_calc	STRING	Name of the country where the sample was collected
geo_loc_name_country_continent_calc	STRING	Name of the continent where the sample was collected
geo_loc_name_sam	STRING	Full location of collection
ena_first_public_run	STRING	Date when INSDC partner record was public
ena_last_update_run	STRING	Date when INSDC partner record was updated
sample_name_sam	STRING	INSDC sample name
datastore_filetype	STRING	Type of files available to download from SRA
datastore_provider	STRING	Locations of where the files are available to download from
datastore_region	STRING	Regions of where the data is located
attributes	RECORD	Full list of sample attributes in a nested(array) structure
attributes.k	STRING	Attribute's name
attributes.v	STRING	Attribute's value
jattr	STRING	JSON based string of the sample attributes

Example queries for web UI

Search for records of an adult female pipefish:

SELECT *
FROM `nih-sra-datastore.sra.metadata` as s
WHERE organism = 'Syngnathus scovelli' and ( ('sex_calc', 'female') in UNNEST(s.attributes) and ('dev_stage_sam', 'Adult') in UNNEST(s.attributes) ) limit 10

Find all the public human data sets using this query:

SELECT *
FROM `nih-sra-datastore.sra.metadata` as s
WHERE organism = 'Homo sapiens' AND consent='public' limit 10

Contact SRA

Contact SRA staff for assistance at sra@ncbi.nlm.nih.gov

Getting Started

Getting Started

Cloud Quick Start

Setting Up

Cloud Data Access

Accessing dbGAP

Download dbGAP with JWT

SRA

SRA