SRA End User Cloud Access Costs
Scenarios
In assessing what an end user's costs will be for accessing SRA data in the cloud, there are multiple scenarios to consider:
-
Are the data in "hot" storage or "cold" storage?
There are no end-user charges for accessing cloud SRA data in the cloud, whether in hot or cold storage, assuming the user is accessing the data from the same cloud region as where the data resides; however, for data in cold storage, an end user must provide a target cloud bucket in which to store the data restored from cold to hot, and user costs will accumulate depending on how long that data is stored in a user bucket.
Typical cloud storage costs are $0.02 per GB per month, or $20 per TB per month. Note that NCBI limits the amount of data that can be restored from cold storage to 5 TB per user per month. -
What is the target compute platform - in cloud or on premises?
If cloud SRA data is to be computed on in the cloud, the user will pay cloud compute charges; these will vary widely based on the compute platform selected, the cloud service provider, and the processing run time.
If the SRA data is transferred to an on-premises compute platform, the user will pay egress charges.
Again, the egress charges very from service provider to service provider, but are typically about $0.09 per GB, or $90 per TB.
Note that the SRA Toolkit by default will attempt to retrieve SRA files from the lowest cost alternative for the user.
When computing in the cloud, this will usually mean retrieving SRA data from the cloud. In some cases, this will mean downloading
SRA files from NCBI rather than the cloud. However, due to space limitations, some files will only be available in the cloud.
To illustrate user costs of accessing data in the cloud, the following examples assume the data of interest are only available in the cloud.
To cover all scenarios, consider a matrix with every combination of data storage class and target compute platform, as follows:
Storage | Hot | Cold |
---|---|---|
Target Platform: In Cloud | Case 1 | Case 2 |
Target Platform: On Premises | Case 3 | Case 4 |
User charges per case scenario
- Case 1: The SRA data of interest are in hot storage, target compute platform is the cloud:
a. User pays only compute charges in the cloud. - Case 2: SRA data are in cold storage, target compute platform is the cloud:
a. User pays storage charges for data restored from cold storage, for as long as the user wants to keep their own copy.
b. User pays compute charges in the cloud. - Case 3: SRA data are in hot storage, target compute platform is on premises:
a. User pays egress charges to the target compute environment. - Case 4: SRA data are in cold storage, target compute platform is on premises:
a. User creates a temporary bucket to receive SRA data being restored from cold storage; since the user intends to egress the data, this can be short term, but the user pays storage costs for as long as they keep the data in their personal bucket.
b. User pays egress charges to the target compute environment.
Examples of costs
The table below shows some examples of user-paid costs, either cloud storage costs or egress costs, depending on the scenario.
Note that cloud compute costs are not shown, since those will vary widely based on the particular research being done.
Also note that when we use the term "cold storage", we are referring to S3 Glacier Deep Archive in the case of AWS,
and GCP Archive Storage in the case of GCP.
Three SRA accessions are used as examples, as follows:
Accession | Dataset Size | Description |
---|---|---|
SRR10173704 | 254 MB | Whole genome Illumina MiSeq sequence of Salmonella enterica |
SRR2052362 | 8,106 MB | NIST Genome in a Bottle, ~300X sequencing of HG001 (NA12878)- 131223_D00360_007_BH88WKADXX- Sample_U5c |
SRR7781428 | 9,273 MB | Downsampled whole genome sequence alignment of human HapMap sample NA12878; sequenced by McDonnell Genome Institute, aligned by University of Michigan |
User cost examples
AWS
AWS | SRR10173704 | SRR2052362 | SRR7781428 |
---|---|---|---|
Retrieval within same AWS Region | |||
S3 Standard | $0.00 | $0.00 | $0.00 |
S3 Glacier Deep Archive | $0.00 | $0.00 | $0.00 |
Monthly Cloud Storage Costs | $0.01 | $0.16 | $0.19 |
Retrieval out of AWS (Egress charges) | |||
Any storage class | $0.03 | $0.72 | $0.82 |
Monthly Cloud Storage Costs | N/A | N/A | N/A |
GCP
GCP | SRR10173704 | SRR2052362 | SRR7781428 |
---|---|---|---|
Retrieval within same GCP Region | |||
GCP Standard Storage | $0.00 | $0.00 | $0.00 |
GCP Archive Storage | $0.00 | $0.00 | $0.00 |
Monthly Cloud Storage Costs | $0.01 | $0.16 | $0.19 |
Retrieval out of GCP (Egress charges) | |||
Any storage class | $0.03 | $0.67 | $0.77 |
Monthly Cloud Storage Costs | N/A | N/A | N/A |
Engage
NCBI wants your feedback on SRA in the Cloud. Contact sra@ncbi.nlm.nih.gov with questions or if you would like to provide input on new functionality.