U.S. flag

An official website of the United States government

Cloud-based Metadata Table

Overview

Metadata Table (sra.metadata) contains information about the run and biological samples. The biological sample data is stored in two different columns.

  1. Record (array) column that you need to use the command UNNEST to query
  2. JSON string.

Linking to other tables:

  • to Taxonomy table by organism column
  • to Taxonomy Analysis Information (tax_analysis_info) table by acc column
  • to Taxonomy Analysis (tax_analysis) table by acc column

Column name Type Desription
acc STRING SRA Run accession in the form of SRR######## (ERR or DRR for INSDC partners)
assay_type STRING Type of library (i.e. AMPLICON, RNA-Seq, WGS, etc)
center_name STRING Name of the sequencing center
consent STRING Type of consent need to access the data (i.e. public is available to all, others are for dbGaP)
experiment STRING The accession in the form of SRX######## (ERX or DRX for INSDC partners)
sample_name STRING Name of the sample
instrument STRING Name of the sequencing instrument model
librarylayout STRING Whether the data is SINGLE or PAIRED
libraryselection STRING Library selection methodology (i.e. PCR, RANDOM, etc)
librarysource STRING Source of the biological data (i.e. GENOMIC, METAGENOMIC, etc)
platform STRING Name of the sequencing platform (i.e. ILLUMINA)
sample_acc STRING SRA Sample accession in the form of SRS######## (ERS or DRS for INSDC partners)
biosample STRING BioSample accession in the form of SAMN######## (SAMEA##### or SAMD##### for INSDC partners)
organism STRING Scientific name of the organism that was sequenced (as found in the NCBI Taxonomy Browser)
sra_study STRING SRA Study accession in the form of SRP######## (ERP or DRP for INSDC partners)
releasedate TIMESTAMP The date on which the data was released
bioproject STRING BioProject accession in the form of PRJNA######## (PRJEB####### or PRJDB###### for INSDC partners)
mbytes INTEGER Number of mega bytes of data in the SRA Run
loaddate TIMESTAMP The date when the data was loaded into SRA
avgspotlen INTEGER Calculated average read length
mbases INTEGER Number of mega bases in the SRA Runs
insertsize INTEGER Submitter provided insert size
library_name STRING The name of the library
biosamplemodel_sam STRING The BioSample package/model that was picked
collection_date_sam STRING The collection date of the sample
geo_loc_name_country_calc STRING Name of the country where the sample was collected
geo_loc_name_country_continent_calc STRING Name of the continent where the sample was collected
geo_loc_name_sam STRING Full location of collection
ena_first_public_run STRING Date when INSDC partner record was public
ena_last_update_run STRING Date when INSDC partner record was updated
sample_name_sam STRING INSDC sample name
datastore_filetype STRING Type of files available to download from SRA
datastore_provider STRING Locations of where the files are available to download from
datastore_region STRING Regions of where the data is located
attributes RECORD Full list of sample attributes in a nested(array) structure
attributes.k STRING Attribute's name
attributes.v STRING Attribute's value
jattr STRING JSON based string of the sample attributes

Example queries for web UI

Search for records of an adult female pipefish:

SELECT *
FROM `nih-sra-datastore.sra.metadata` as s
WHERE organism = 'Syngnathus scovelli' and ( ('sex_calc', 'female') in UNNEST(s.attributes) and ('dev_stage_sam', 'Adult') in UNNEST(s.attributes) ) limit 10

Find all the public human data sets using this query:

SELECT *
FROM `nih-sra-datastore.sra.metadata` as s
WHERE organism = 'Homo sapiens' AND consent='public' limit 10

Contact SRA

Contact SRA staff for assistance at sra@ncbi.nlm.nih.gov

Support Center

Last updated: 2020-09-16T18:18:59Z