Cloud-based Metadata Table
Overview
Metadata Table (sra.metadata
) contains information about the run and biological samples.
The biological sample data is stored in two different columns.
- Record (array) column that you need to use the command UNNEST to query
- JSON string.
Linking to other tables:
- to Taxonomy table by
organism
column - to Taxonomy Analysis Information (
tax_analysis_info
) table byacc
column - to Taxonomy Analysis (
tax_analysis
) table byacc
column
Column name | Type | Desription |
---|---|---|
acc | STRING | SRA Run accession in the form of SRR######## (ERR or DRR for INSDC partners) |
assay_type | STRING | Type of library (i.e. AMPLICON, RNA-Seq, WGS, etc) |
center_name | STRING | Name of the sequencing center |
consent | STRING | Type of consent need to access the data (i.e. public is available to all, others are for dbGaP) |
experiment | STRING | The accession in the form of SRX######## (ERX or DRX for INSDC partners) |
sample_name | STRING | Name of the sample |
instrument | STRING | Name of the sequencing instrument model |
librarylayout | STRING | Whether the data is SINGLE or PAIRED |
libraryselection | STRING | Library selection methodology (i.e. PCR, RANDOM, etc) |
librarysource | STRING | Source of the biological data (i.e. GENOMIC, METAGENOMIC, etc) |
platform | STRING | Name of the sequencing platform (i.e. ILLUMINA) |
sample_acc | STRING | SRA Sample accession in the form of SRS######## (ERS or DRS for INSDC partners) |
biosample | STRING | BioSample accession in the form of SAMN######## (SAMEA##### or SAMD##### for INSDC partners) |
organism | STRING | Scientific name of the organism that was sequenced (as found in the NCBI Taxonomy Browser) |
sra_study | STRING | SRA Study accession in the form of SRP######## (ERP or DRP for INSDC partners) |
releasedate | TIMESTAMP | The date on which the data was released |
bioproject | STRING | BioProject accession in the form of PRJNA######## (PRJEB####### or PRJDB###### for INSDC partners) |
mbytes | INTEGER | Number of mega bytes of data in the SRA Run |
loaddate | TIMESTAMP | The date when the data was loaded into SRA |
avgspotlen | INTEGER | Calculated average read length |
mbases | INTEGER | Number of mega bases in the SRA Runs |
insertsize | INTEGER | Submitter provided insert size |
library_name | STRING | The name of the library |
biosamplemodel_sam | STRING | The BioSample package/model that was picked |
collection_date_sam | STRING | The collection date of the sample |
geo_loc_name_country_calc | STRING | Name of the country where the sample was collected |
geo_loc_name_country_continent_calc | STRING | Name of the continent where the sample was collected |
geo_loc_name_sam | STRING | Full location of collection |
ena_first_public_run | STRING | Date when INSDC partner record was public |
ena_last_update_run | STRING | Date when INSDC partner record was updated |
sample_name_sam | STRING | INSDC sample name |
datastore_filetype | STRING | Type of files available to download from SRA |
datastore_provider | STRING | Locations of where the files are available to download from |
datastore_region | STRING | Regions of where the data is located |
attributes | RECORD | Full list of sample attributes in a nested(array) structure |
attributes.k | STRING | Attribute's name |
attributes.v | STRING | Attribute's value |
jattr | STRING | JSON based string of the sample attributes |
Example queries for web UI
Search for records of an adult female pipefish:
FROM `nih-sra-datastore.sra.metadata` as s
WHERE organism = 'Syngnathus scovelli' and ( ('sex_calc', 'female') in UNNEST(s.attributes) and ('dev_stage_sam', 'Adult') in UNNEST(s.attributes) ) limit 10
Find all the public human data sets using this query:
FROM `nih-sra-datastore.sra.metadata` as s
WHERE organism = 'Homo sapiens' AND consent='public' limit 10
Contact SRA
Contact SRA staff for assistance at sra@ncbi.nlm.nih.gov