GOAnnotator: linking protein GO annotations to evidence text
- PMID: 17181854
- PMCID: PMC1769513
- DOI: 10.1186/1747-5333-1-19
GOAnnotator: linking protein GO annotations to evidence text
Abstract
Background: Annotation of proteins with gene ontology (GO) terms is ongoing work and a complex task. Manual GO annotation is precise and precious, but it is time-consuming. Therefore, instead of curated annotations most of the proteins come with uncurated annotations, which have been generated automatically. Text-mining systems that use literature for automatic annotation have been proposed but they do not satisfy the high quality expectations of curators.
Results: In this paper we describe an approach that links uncurated annotations to text extracted from literature. The selection of the text is based on the similarity of the text to the term from the uncurated annotation. Besides substantiating the uncurated annotations, the extracted texts also lead to novel annotations. In addition, the approach uses the GO hierarchy to achieve high precision. Our approach is integrated into GOAnnotator, a tool that assists the curation process for GO annotation of UniProt proteins.
Conclusion: The GO curators assessed GOAnnotator with a set of 66 distinct UniProt/SwissProt proteins with uncurated annotations. GOAnnotator provided correct evidence text at 93% precision. This high precision results from using the GO hierarchy to only select GO terms similar to GO terms from uncurated annotations in GOA. Our approach is the first one to achieve high precision, which is crucial for the efficient support of GO curators. GOAnnotator was implemented as a web tool that is freely available at http://xldb.di.fc.ul.pt/rebil/tools/goa/.
Figures


Similar articles
-
An evaluation of GO annotation retrieval for BioCreAtIvE and GOA.BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S17. doi: 10.1186/1471-2105-6-S1-S17. Epub 2005 May 24. BMC Bioinformatics. 2005. PMID: 15960829 Free PMC article.
-
Evaluation of BioCreAtIvE assessment of task 2.BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S16. doi: 10.1186/1471-2105-6-S1-S16. Epub 2005 May 24. BMC Bioinformatics. 2005. PMID: 15960828 Free PMC article.
-
The UniProt-GO Annotation database in 2011.Nucleic Acids Res. 2012 Jan;40(Database issue):D565-70. doi: 10.1093/nar/gkr1048. Epub 2011 Nov 28. Nucleic Acids Res. 2012. PMID: 22123736 Free PMC article.
-
How to learn about gene function: text-mining or ontologies?Methods. 2015 Mar;74:3-15. doi: 10.1016/j.ymeth.2014.07.004. Epub 2014 Aug 1. Methods. 2015. PMID: 25088781 Review.
-
Deep Question Answering for protein annotation.Database (Oxford). 2015 Sep 16;2015:bav081. doi: 10.1093/database/bav081. Print 2015. Database (Oxford). 2015. PMID: 26384372 Free PMC article. Review.
Cited by
-
False positive reduction in protein-protein interaction predictions using gene ontology annotations.BMC Bioinformatics. 2007 Jul 23;8:262. doi: 10.1186/1471-2105-8-262. BMC Bioinformatics. 2007. PMID: 17645798 Free PMC article.
-
Multi-label literature classification based on the Gene Ontology graph.BMC Bioinformatics. 2008 Dec 8;9:525. doi: 10.1186/1471-2105-9-525. BMC Bioinformatics. 2008. PMID: 19063730 Free PMC article.
-
A weighted multipath measurement based on gene ontology for estimating gene products similarity.J Comput Biol. 2014 Dec;21(12):964-74. doi: 10.1089/cmb.2014.0143. J Comput Biol. 2014. PMID: 25229994 Free PMC article.
-
NOA: a novel Network Ontology Analysis method.Nucleic Acids Res. 2011 Jul;39(13):e87. doi: 10.1093/nar/gkr251. Epub 2011 May 4. Nucleic Acids Res. 2011. PMID: 21543451 Free PMC article.
-
Improving classification in protein structure databases using text mining.BMC Bioinformatics. 2009 May 5;10:129. doi: 10.1186/1471-2105-10-129. BMC Bioinformatics. 2009. PMID: 19416501 Free PMC article.
References
LinkOut - more resources
Full Text Sources