{"id":2044,"date":"2018-01-16T13:54:41","date_gmt":"2018-01-16T18:54:41","guid":{"rendered":"http:\/\/ncbiinsights.ncbi.nlm.nih.gov\/?p=2044"},"modified":"2020-08-28T14:04:37","modified_gmt":"2020-08-28T18:04:37","slug":"5-ncbi-articles-2018-nucleic-acids-research-database-issue","status":"publish","type":"post","link":"https:\/\/ncbiinsights.ncbi.nlm.nih.gov\/2018\/01\/16\/5-ncbi-articles-2018-nucleic-acids-research-database-issue\/","title":{"rendered":"5 NCBI articles in 2018 Nucleic Acids Research database issue"},"content":{"rendered":"
The 2018 Nucleic Acids Research database issue<\/a> features several papers from NCBI staff that cover the status and future of databases including CCDS, ClinVar, GenBank and RefSeq. These papers are also available on PubMed<\/a>. To read an article, click on the PMID number listed below.<\/p>\n <\/p>\n by NCBI Resource Coordinators (PMID: 29140470<\/a>)<\/em><\/p>\n The\u00a0National\u00a0Center\u00a0for\u00a0Biotechnology\u00a0Information\u00a0(NCBI) provides a large suite of online\u00a0resources\u00a0for biological\u00a0information\u00a0and data, including the GenBank\u00ae nucleic acid sequence\u00a0database\u00a0and the PubMed\u00a0database\u00a0of citations and abstracts for published life science journals.<\/p>\n The Entrez system provides search and retrieval operations for most of these data from 39 distinct databases. The E-utilities serve as the programming interface for the Entrez system. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets.<\/p>\n New\u00a0resources\u00a0released in the past year include PubMed Data Management, RefSeq Functional Elements, genome data download, variation services API, Magic-BLAST, QuickBLASTp, and Identical Protein Groups.\u00a0Resources\u00a0that were updated in the past year include the genome data viewer, a human genome\u00a0resources\u00a0page, Gene, virus variation, OSIRIS, and PubChem.<\/p>\n All of these\u00a0resources\u00a0can be accessed through the NCBI home page<\/a>.<\/p><\/blockquote>\n by Dennis A Benson, Mark Cavanaugh, Karen Clark, Ilene Karsch-Mizrachi, James Ostell, Kim D Pruitt and Eric W Sayers (PMID: 29140468<\/a>)<\/em><\/p>\n GenBank\u00ae<\/a> is a comprehensive database that contains publicly available nucleotide sequences for 400 000 formally described species. These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole genome shotgun and environmental sampling projects.<\/p>\n Most submissions are made using BankIt, the National Center for Biotechnology Information (NCBI) Submission Portal, or the tool tbl2asn.\u00a0GenBank\u00a0staff assign accession numbers upon data receipt. Daily data exchange with the European Nucleotide Archive and the DNA Data Bank of Japan ensures worldwide coverage.<\/p>\n GenBank\u00a0is accessible through the NCBI Nucleotide database, which links to related information such as taxonomy, genomes, protein sequences and structures, and biomedical journal literature in PubMed. BLAST provides sequence similarity searches of\u00a0GenBank\u00a0and other sequence\u00a0databases.<\/p>\n Complete bimonthly releases and daily updates of the\u00a0GenBank\u00a0database are available by FTP. Recent updates include changes to sequence identifiers, submission wizards for 16S and Influenza sequences, and an Identical Protein Groups resource.<\/p><\/blockquote>\n by Shashikant Pujar, Nuala A O\u2019Leary, Catherine M Farrell, Jane E Loveland, Jonathan M Nudge et al. (PMID: 29126148<\/a>)<\/em><\/p>\n The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI.<\/p>\n This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID).<\/p>\n Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page<\/a> and an FTP site<\/a>.<\/p>\n In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community.<\/p><\/blockquote>\n by Daniel D Haft, Michael DiCuccio, Azat Badretdin, Vyacheslav Brover, Vyacheslav Chetvernin et al. (PMID: 29112715<\/a>)<\/em><\/p>\n The Reference Sequence (RefSeq<\/a>) project at the National Center for Biotechnology Information (NCBI) provides annotation for over 95 000 prokaryotic genomes that meet standards for sequence quality, completeness, and freedom from contamination.<\/p>\n Genomes are annotated by a single Prokaryotic Genome Annotation Pipeline (PGAP) to provide users with a resource that is as consistent and accurate as possible. Notable recent changes include the development of a hierarchical evidence scheme, a new focus on curating annotation evidence sources, the addition and curation of protein profile hidden Markov models (HMMs), release of an updated pipeline (PGAP-4), and comprehensive re-annotation of RefSeq prokaryotic genomes.<\/p>\n Antimicrobial resistance proteins have been reannotated comprehensively, improved structural annotation of insertion sequence transposases and selenoproteins is provided, curated complex domain architectures have given upgraded names to millions of multidomain proteins, and we introduce a new kind of annotation rule-BlastRules.<\/p>\n Continual curation of supporting evidence, and propagation of improved names onto RefSeq proteins ensures that the functional annotation of genomes is kept current. An increasing share of our annotation now derives from HMMs and other sets of annotation rules that are portable by nature, and available for download and for reuse by other investigators.<\/p><\/blockquote>\n by Melissa J. Landrum, Jennifer M. Lee, Mark Benson, Garth Brown, Chen Chao et al. (PMID: 29165669<\/a>)<\/em><\/p>\n\u201cDatabase resources of the National Center for Biotechnology Information\u201d<\/h2>\n
\u201cGenBank\u201d<\/h2>\n
\u201cConsensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation\u201d<\/h2>\n
\u201cRefSeq: an update on prokaryotic genome annotation and curation\u201d<\/h2>\n
\u201cClinVar: improving access to variant interpretations and supporting evidence\u201d<\/h2>\n