Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Nov 1;33(21):3454-3460.
doi: 10.1093/bioinformatics/btx439.

On expert curation and scalability: UniProtKB/Swiss-Prot as a case study

Affiliations

On expert curation and scalability: UniProtKB/Swiss-Prot as a case study

Sylvain Poux et al. Bioinformatics. .

Abstract

Motivation: Biological knowledgebases, such as UniProtKB/Swiss-Prot, constitute an essential component of daily scientific research by offering distilled, summarized and computable knowledge extracted from the literature by expert curators. While knowledgebases play an increasingly important role in the scientific community, their ability to keep up with the growth of biomedical literature is under scrutiny. Using UniProtKB/Swiss-Prot as a case study, we address this concern via multiple literature triage approaches.

Results: With the assistance of the PubTator text-mining tool, we tagged more than 10 000 articles to assess the ratio of papers relevant for curation. We first show that curators read and evaluate many more papers than they curate, and that measuring the number of curated publications is insufficient to provide a complete picture as demonstrated by the fact that 8000-10 000 papers are curated in UniProt each year while curators evaluate 50 000-70 000 papers per year. We show that 90% of the papers in PubMed are out of the scope of UniProt, that a maximum of 2-3% of the papers indexed in PubMed each year are relevant for UniProt curation, and that, despite appearances, expert curation in UniProt is scalable.

Availability and implementation: UniProt is freely available at http://www.uniprot.org/.

Contact: sylvain.poux@sib.swiss.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Screenshot of the PubTator tool. Some of PubTator’s functionalities include: (1) export of PubMed identifiers and annotations for the different sets (e.g. curatable and not-curatable); (2) menu for not-curatable options; access to abstract with annotations; and table with annotations and with links to UniProt accessions

Similar articles

Cited by

References

    1. Baker M. (2016) 1,500 scientists lift the lid on reproducibility. Nature, 533, 452–454. - PubMed
    1. Bandrowski A. et al. (2015) The Resource Identification Initiative: a cultural shift in publishing. F1000Research, 4, 134.. - PMC - PubMed
    1. Bengtsson-Palme J. et al. (2016) Strategies to improve usability and preserve accuracy in biological sequence databases. Proteomics, 16, 2454–2460. - PubMed
    1. Bourne P.E. et al. (2015) Perspective: Sustaining the big-data ecosystem. Nature, 527, S16–S17. - PubMed
    1. Breuza L. et al. (2016) The UniProtKB guide to the human proteome. Database (Oxford), 2016, bav120.. - PMC - PubMed

MeSH terms