Upcoming Changes to EST and GSS Databases

Update: NCBI is now in the process of merging EST and GSS records into the Nucleotide database, and we expect to complete this process in early 2019. Accession.version and GI identifiers will not change during this process.

As of December 1, 2018, all records from the databases for Expressed Sequence Tags (EST) and Genome Survey Sequences (GSS) will reside in NCBI’s Nucleotide database. This change will provide a single point of access for all GenBank sequence data with a common look and feel.

Read more to learn about how this change affects these resources:

Websites (Entrez)
APIs (E-utilities)
FTP sites
Submission procedures
BLAST
TSA (have a look if you’re not familiar!)

Why are we doing this?

Sequencing technologies have moved away from generating ESTs for evaluating gene expression and GSSs for evaluating clone libraries in favor of next-generation data, which is deposited in resources such as the Sequence Read Archive (SRA). Consolidating the EST and GSS datasets thus helps us to align our services to the current needs of the bioinformatics community.

Changes to websites

The most notable change is that the EST (nucest) and GSS (nucgss) Entrez databases will be retired, along with the default EST and GSS record formats. All EST and GSS records will be moved to the Nucleotide (nuccore) database and will have the default “GenBank” view shared by all current Nucleotide records (see Figure 1). New filters will be added to the Nucleotide database to make it easy for users to remove or include EST and GSS sequences in search results.

Sample EST record displayed in new default "GenBank" format for ESTs and GSSs — **Figure 1.** Sample EST record displayed in the “GenBank” format that will become the default format for ESTs and GSSs after December 1, 2018.

Changes to APIs

Similar to the changes on the web, there will no longer be separate EST and GSS databases (db=nucest and db=nucgss) in the E-utilities API. Requests containing these values will be redirected to db=nuccore. This also holds true for dbfrom in ELink requests, but values of linkname containing nucest or nucgss will be ignored after December 1 (resulting in returns unrestricted by linkname, containing all possible links between dbfrom and db.)

Users accessing the Nucleotide database (db=nuccore, db=nucleotide) with esearch should note that after December 1, all search results will contain EST and GSS records matching the provided query (term). This may markedly increase the number of records returned. To remove EST and GSS sequences from esearch results, add the following terms to your query:

NOT gbdiv est[prop] NOT gbdiv gss[prop]

Changes to FTP sites

EST and GSS data have been part of the regular GenBank release set for many years, and will continue to be available at ftp.ncbi.nlm.nih.gov/genbank/. These data will be in the standard GenBank format. After December 1, 2018, the current specialized (default) EST and GSS formats will no longer be available by FTP at ftp.ncbi.nlm.nih.gov/repository/dbEST and ftp.ncbi.nlm.nih.gov/repository/dbGSS/.

Changes to submission procedures

We will continue to accept submissions of EST and GSS sequences; however, there will no longer be special processes for submitting these sequence types. We recommend that submitters of EST and GSS data begin using the tool tbl2asn now. This tool will be required after December 1, 2018. Please write to gb-admin@ncbi.nlm.nih.gov for more information.

Changes to BLAST

For many years BLAST has supported distinct databases for EST and GSS data, and these are available from the database pulldown on nucleotide BLAST web pages. We will continue to support these databases beyond December 1, 2018, for both web and standalone BLAST, so there is no need to alter any process that depends on these databases.

TSA – Have a look!

Finally, we encourage interested users to consider TSA (transcript shotgun assembly) data as a rich source of information about expressed sequences. TSA data are computational assemblies of sequence reads, and as such form attractive BLAST databases useful for identifying putative transcripts (choose “Transcript Shotgun Assembly (TSA)” from the nucleotide BLAST database menu).

We thank all past and present submitters of EST and GSS data for the invaluable benefit these data have provided to numerous genomic sequencing projects over the years. Please let us know if you have any questions or concerns about these changes!

What's New

10 thoughts on “Upcoming Changes to EST and GSS Databases”

nick nabok says:

September 26, 2018 at 7:01 am

Elimination of “Nucleotide” data base is an action of terrorism. It is even worse than elimination of ORF finder , you did a couple years ago.

Loading...

1. NCBI Staff says:
  
  September 26, 2018 at 11:27 am
  
  We are consolidating the GSS and EST datasets within the Nucleotide database. We are not eliminating Nucleotide.
  
  ORFfinder is an NCBI Labs experiment, so it’s subject to change. However, it’s still accessible here: https://www.ncbi.nlm.nih.gov/orffinder/
  
  If you notice any functionality missing from this version of ORFfinder, please let us know at info@ncbi.nlm.nih.gov.
  
  Thank you for your feedback!
  
  Loading...
  
Gabriela Sevillano says:

March 16, 2019 at 9:40 am

when we can visualize the sequences that have been uploaded after December 1?

Loading...

E nagapriyanka says:

April 13, 2019 at 5:36 am

gene bank sequence is not uploading and not showing on the screen it shows GSS and EST nucleotide

Loading...

1. NCBI Staff says:
  
  April 15, 2019 at 12:00 pm
  
  Please send us an email at info@ncbi.nlm.nih.gov so we can help you.
  
  Loading...
  
Hugo Mejía-Madrid says:

June 10, 2019 at 11:59 am

I enterded ORF and answered me with ‘sequence is too short’. When should I expect to recover any of the “old” sequences? I have to build a phylogeny right now and find out I cannot access sequences anymore.

Loading...

1. NCBI Staff says:
  
  June 13, 2019 at 12:32 pm
  
  Hello Hugo, please send us an email at info@ncbi.nlm.nih.gov if you’re still having trouble accessing the sequences you need.
  
  Loading...
  
honey modi says:

June 10, 2019 at 6:37 pm

gene bank sequence is not uploading and not showing on the screen it shows GSS and EST nucleotide

Loading...

1. NCBI Staff says:
  
  June 13, 2019 at 12:33 pm
  
  Please send us an email at info@ncbi.nlm.nih.gov so we can help.
  
  Loading...
  
Pingback: EST and GSS databases now retired | NCBI Insights

NCBI Insights

Upcoming Changes to EST and GSS Databases

Why are we doing this?

Changes to websites

Changes to APIs

Changes to FTP sites

Changes to submission procedures

Changes to BLAST

TSA – Have a look!

Like this:

10 thoughts on “Upcoming Changes to EST and GSS Databases”

Leave a ReplyCancel reply

Why are we doing this?

Changes to websites

Changes to APIs

Changes to FTP sites

Changes to submission procedures

Changes to BLAST

TSA – Have a look!

Share this post:

Like this:

10 thoughts on “Upcoming Changes to EST and GSS Databases”

Leave a ReplyCancel reply

Discover more from NCBI Insights