We have recently added several exciting improvements to the SARS-CoV-2 GenBank submission process based on community feedback. To save you time, NCBI completes feature annotation for you, which means SARS-CoV-2 GenBank submission only requires a FASTA file and source metadata. Here are other new features to ease and simplify your submission workflow.
Automatically remove failed sequences from a submission: On the web, a single click lets you opt-in to automatic removal of failed sequences (Figure 1) so that the rest of your sequences can be swiftly accessioned! A report provided after the submission lists your failed sequences and points out potential sequence problems so that you can take a closer look after your error-free sequences are released. This option is also available for submission via FTP.
Need to set up FTP submissions? The NCBI team is here to help. Contact gb-admin@ncbi.nlm.nih.gov.
Figure 1. GenBank submission page showing the option to remove sequences with processing errors.
Automatic annotation enhancements based on data analysis: Many submitters tell us our annotation tool VADR (Viral Annotation DefineR) is useful for SARS-CoV-2 annotation as well as submission preparation. We recently updated VADR based on data trends over the pandemic to now allow sequences with common issues like frameshifts in some coding regions — specifically in ORF3a, ORF6, ORF7a, ORF7b, ORF8, and ORF10 CDS regions — to pass instead of fail. In such cases, the coding region is reported as a miscellaneous feature (misc_feature) and no corresponding protein sequence entry is created. This is similar to how ORF8 has been treated since February 2021.
If you’d like to run VADR on your sequences before submitting, you can install and run it locally following the instructions on GitHub .
Run NCBI’s ‘FASTAedit’ command line binary, now available publicly, to prepare sequences: We just released a public version of our FASTAedit tool that allows you to:
- Identify and handle low-similarity discrepancies when running VADR prior to GenBank submission.
- Trim the way NCBI trims when GenBank submissions are processed! Gain overall consistency trimming sequences for analysis.
To get started, review our documentation and download the tool. In addition to FASTAedit, The NCBI Toolbox has a full suite of command-line tools that are helpful in converting, manipulating, and validating submission-related file types.
New SARS-CoV-2 BioSample packages for GenBank (and SRA) submission: We also have a dedicated BioSample package for SARS-CoV-2 clinical data to help you understand and prepare relevant metadata when submitting to GenBank or the Sequence Read Archive (SRA)! We created this clear framework with a sharp focus on what the community needs to support surveillance while also assisting data submitters. As a FAIR, open data archive, we hope standardized fields will collect more useful information to aid in public health and epidemiology. Use this new SARS-CoV-2 clinical BioSample package type on either web (Figure 2) or XML/FTP programmatic submissions.
Tip: If you submit both raw, Next Generation Sequencing (NGS) reads to SRA as well as the assembled data to GenBank, you can use your BioProject and BioSample to link those accessions and increase linkage between all your data.
Are you a public health lab participating in wastewater surveillance? In response to increasing monitoring and surveillance of wastewater samples, we now also offer a BioSample package for wastewater sample submission (Figure 2).
Figure 2. Selection for the SARS-CoV-2 packages: clinical or host associated and wastewater surveillance.
You can always find the latest details on how to submit your SARS-CoV-2 sequences on the SARS-CoV-2 submission page. Stay tuned for future enhancements to NCBI submission!