Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Dec 20:1:19.
doi: 10.1186/1747-5333-1-19.

GOAnnotator: linking protein GO annotations to evidence text

Affiliations

GOAnnotator: linking protein GO annotations to evidence text

Francisco M Couto et al. J Biomed Discov Collab. .

Abstract

Background: Annotation of proteins with gene ontology (GO) terms is ongoing work and a complex task. Manual GO annotation is precise and precious, but it is time-consuming. Therefore, instead of curated annotations most of the proteins come with uncurated annotations, which have been generated automatically. Text-mining systems that use literature for automatic annotation have been proposed but they do not satisfy the high quality expectations of curators.

Results: In this paper we describe an approach that links uncurated annotations to text extracted from literature. The selection of the text is based on the similarity of the text to the term from the uncurated annotation. Besides substantiating the uncurated annotations, the extracted texts also lead to novel annotations. In addition, the approach uses the GO hierarchy to achieve high precision. Our approach is integrated into GOAnnotator, a tool that assists the curation process for GO annotation of UniProt proteins.

Conclusion: The GO curators assessed GOAnnotator with a set of 66 distinct UniProt/SwissProt proteins with uncurated annotations. GOAnnotator provided correct evidence text at 93% precision. This high precision results from using the GO hierarchy to only select GO terms similar to GO terms from uncurated annotations in GOA. Our approach is the first one to achieve high precision, which is crucial for the efficient support of GO curators. GOAnnotator was implemented as a web tool that is freely available at http://xldb.di.fc.ul.pt/rebil/tools/goa/.

PubMed Disclaimer

Figures

Figure 1
Figure 1
List of documents related with a given protein. The list is sorted by the most similar term extracted from each document. The curator can use the Extract option to see the extracted terms together with the evidence text. By default GOAnnotator uses only the abstract, but the curator can use the AddText option to replace or insert text.
Figure 2
Figure 2
GO terms extracted. For each uncurated annotation, GOAnnotator shows the similar GO terms extracted from a sentence of the selected document. If any of the sentences provides correct evidence for the uncurated annotation, or if the evidence supports a GO term similar to that present in the uncurated annotation, the curator can use the Add option to store the annotation together with the document reference, the evidence codes and any comments.

Similar articles

  • An evaluation of GO annotation retrieval for BioCreAtIvE and GOA.
    Camon EB, Barrell DG, Dimmer EC, Lee V, Magrane M, Maslen J, Binns D, Apweiler R. Camon EB, et al. BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S17. doi: 10.1186/1471-2105-6-S1-S17. Epub 2005 May 24. BMC Bioinformatics. 2005. PMID: 15960829 Free PMC article.
  • Evaluation of BioCreAtIvE assessment of task 2.
    Blaschke C, Leon EA, Krallinger M, Valencia A. Blaschke C, et al. BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S16. doi: 10.1186/1471-2105-6-S1-S16. Epub 2005 May 24. BMC Bioinformatics. 2005. PMID: 15960828 Free PMC article.
  • The UniProt-GO Annotation database in 2011.
    Dimmer EC, Huntley RP, Alam-Faruque Y, Sawford T, O'Donovan C, Martin MJ, Bely B, Browne P, Mun Chan W, Eberhardt R, Gardner M, Laiho K, Legge D, Magrane M, Pichler K, Poggioli D, Sehra H, Auchincloss A, Axelsen K, Blatter MC, Boutet E, Braconi-Quintaje S, Breuza L, Bridge A, Coudert E, Estreicher A, Famiglietti L, Ferro-Rojas S, Feuermann M, Gos A, Gruaz-Gumowski N, Hinz U, Hulo C, James J, Jimenez S, Jungo F, Keller G, Lemercier P, Lieberherr D, Masson P, Moinat M, Pedruzzi I, Poux S, Rivoire C, Roechert B, Schneider M, Stutz A, Sundaram S, Tognolli M, Bougueleret L, Argoud-Puy G, Cusin I, Duek-Roggli P, Xenarios I, Apweiler R. Dimmer EC, et al. Nucleic Acids Res. 2012 Jan;40(Database issue):D565-70. doi: 10.1093/nar/gkr1048. Epub 2011 Nov 28. Nucleic Acids Res. 2012. PMID: 22123736 Free PMC article.
  • How to learn about gene function: text-mining or ontologies?
    Soldatos TG, Perdigão N, Brown NP, Sabir KS, O'Donoghue SI. Soldatos TG, et al. Methods. 2015 Mar;74:3-15. doi: 10.1016/j.ymeth.2014.07.004. Epub 2014 Aug 1. Methods. 2015. PMID: 25088781 Review.
  • Deep Question Answering for protein annotation.
    Gobeill J, Gaudinat A, Pasche E, Vishnyakova D, Gaudet P, Bairoch A, Ruch P. Gobeill J, et al. Database (Oxford). 2015 Sep 16;2015:bav081. doi: 10.1093/database/bav081. Print 2015. Database (Oxford). 2015. PMID: 26384372 Free PMC article. Review.

Cited by

References

    1. Camon E, Magrane M, Barrell D, Lee V, Dimmer E, Maslen J, Binns D, Harte N, Lopez R, Apweiler R. The Gene Ontology Annotations (GOA) Database: sharing knowledge in UniProt with Gene Ontology. Nucleic Acids Research. 2004;32:262–266. doi: 10.1093/nar/gkh021. - DOI - PMC - PubMed
    1. GO-Consortium The Gene Ontology (GO) database and informatics resource. Nucleic Acids Research. 2004:D258–D261. doi: 10.1093/nar/gkh036. - DOI - PMC - PubMed
    1. Apweiler R, Bairoch A, Wu C, Barker W, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin M, Natale D, O'Donovan C, Redaschi N, Yeh L. UniProt: the Universal Protein Knowledgebase. Nucleic Acids Research. 2004:D115–D119. doi: 10.1093/nar/gkh131. - DOI - PMC - PubMed
    1. Andrade M, Valencia A. Automatic Extraction of Keywords from Scientific Text: Application to the Knowledge Domain of Protein Families. Bioinformatics. 1998;14:600–607. doi: 10.1093/bioinformatics/14.7.600. - DOI - PubMed
    1. Chiang J, Yu H. MeKE: discovering the functions of gene products from biomedical literature via sentence alignment. Bioinformatics. 2003;19:1417–1422. doi: 10.1093/bioinformatics/btg160. - DOI - PubMed

LinkOut - more resources