dbGAP Download Guide
Introduction
The database of Genotypes and Phenotypes (dbGaP) was developed to archive and distribute the data and results from studies investigating the interaction of genotype and phenotype in humans.
The following guide will outline the configuration of the SRA Toolkit for use with protected data from dbGaP.
Detailed information regarding the usage of individual tools in the SRA Toolkit can be found on the tool-specific documentation pages.
Starting with SRA Toolkit version 2.10.2, there are several important changes:
- You no longer need to import the NGC file to the configuration
- The NGC file will need to be specified as part of the command line every time you run a tool
- For SRA Runs, you no longer have an option to create a cart, but will need to use a list of Run accessions
Prerequisites
- User must have SRA Toolkit latest release
installed.
- You will need to run
vdb-config -i
a single time to generate the basic configuration setup. No options need to be set to use the toolkit version 2.10.4. - Users that wish to access controlled-access data must first apply for approval. Please review the process at the Authorized Access Portal
.
- Once granted access to a project, the PI may login and click the get dbGaP repository key link next to the project to download the repository key. This file should be closely guarded.

For users that do not yet have an approved project, the test key prj_phs710EA_test.ngc is available for accessing a copy of 1000 Genomes data from NCBI. Downloading this key will allow users to test their toolkit configuration on encrypted data that is consented for public access.
Downloading with NGC for use on any server
Downloading the data
To download the data, run the following command:
./prefetch --ngc your_file.ngc SRR1234567
This will create a file called something like SRR1234567_dbgap_#####.sra
. To decrypt the data,
run the same command as before, but change the name of the Run file by removing the 'dbgap#####':
./mv SRR1234567_dbgap_#####.sra SRR1234567.sra
And provide the NGC on the command line again:
fasterq-dump --ngc your_file.ngc SRR1234567.sra
Downloading phenotype files with ngc
Similar to downloading protected SRA Runs, downloading phenotype files has also changed since Toolkit version 2.10.2.
In the dbGaP File Selector(available from your project's authorized access page) select at least one file to activate the button Cart file.
To download the data, run the following command:
prefetch --ngc your_file.ngc cart_prj#####_###.krt
To decrypt the data, run the same command as before, but provide the NGC on the command line again:
vdb-decrypt --ngc your_file.ngc enc_file.xml
The SRA Toolkit version 2.9.6 visual configuration
For users who cannot upgrade to newer 2.10 version:
The SRA Toolkit version 2.9.6 visual configuration
Accessing dbGaP Data on the Cloud
Contact SRA
Contact SRA staff for assistance at sra@ncbi.nlm.nih.gov