Discovering patterns to extract protein-protein interactions from the literature: Part II
- PMID: 15890744
- DOI: 10.1093/bioinformatics/bti493
Discovering patterns to extract protein-protein interactions from the literature: Part II
Abstract
Motivation: An enormous number of protein-protein interaction relationships are buried in millions of research articles published over the years, and the number is growing. Rediscovering them automatically is a challenging bioinformatics task. Solutions to this problem also reach far beyond bioinformatics.
Results: We study a new approach that involves automatically discovering English expression patterns, optimizing them and using them to extract protein-protein interactions. In a sister paper, we described how to generate English expression patterns related to protein-protein interactions, and this approach alone has already achieved precision and recall rates significantly higher than those of other automatic systems. This paper continues to present our theory, focusing on how to improve the patterns. A minimum description length (MDL)-based pattern-optimization algorithm is designed to reduce and merge patterns. This has significantly increased generalization power, and hence the recall and precision rates, as confirmed by our experiments.
Availability: http://spies.cs.tsinghua.edu.cn.
Similar articles
-
Discovering patterns to extract protein-protein interactions from full texts.Bioinformatics. 2004 Dec 12;20(18):3604-12. doi: 10.1093/bioinformatics/bth451. Epub 2004 Jul 29. Bioinformatics. 2004. PMID: 15284092
-
Text similarity: an alternative way to search MEDLINE.Bioinformatics. 2006 Sep 15;22(18):2298-304. doi: 10.1093/bioinformatics/btl388. Epub 2006 Aug 22. Bioinformatics. 2006. PMID: 16926219
-
Extracting human protein interactions from MEDLINE using a full-sentence parser.Bioinformatics. 2004 Mar 22;20(5):604-11. doi: 10.1093/bioinformatics/btg452. Epub 2004 Jan 22. Bioinformatics. 2004. PMID: 15033866
-
Extracting interactions between proteins from the literature.J Biomed Inform. 2008 Apr;41(2):393-407. doi: 10.1016/j.jbi.2007.11.008. Epub 2007 Dec 15. J Biomed Inform. 2008. PMID: 18207462 Review.
-
Hairpins in bookstacks: information retrieval from biomedical text.Brief Bioinform. 2005 Sep;6(3):222-38. doi: 10.1093/bib/6.3.222. Brief Bioinform. 2005. PMID: 16212771 Review.
Cited by
-
A comprehensive benchmark of kernel methods to extract protein-protein interactions from literature.PLoS Comput Biol. 2010 Jul 1;6(7):e1000837. doi: 10.1371/journal.pcbi.1000837. PLoS Comput Biol. 2010. PMID: 20617200 Free PMC article.
-
PPInterFinder--a mining tool for extracting causal relations on human proteins from literature.Database (Oxford). 2013 Jan 15;2013:bas052. doi: 10.1093/database/bas052. Print 2013. Database (Oxford). 2013. PMID: 23325628 Free PMC article.
-
The Text-mining based PubChem Bioassay neighboring analysis.BMC Bioinformatics. 2010 Nov 8;11:549. doi: 10.1186/1471-2105-11-549. BMC Bioinformatics. 2010. PMID: 21059237 Free PMC article.
-
PCorral--interactive mining of protein interactions from MEDLINE.Database (Oxford). 2013 May 2;2013:bat030. doi: 10.1093/database/bat030. Print 2013. Database (Oxford). 2013. PMID: 23640984 Free PMC article.
-
Extraction of Protein-Protein Interaction from Scientific Articles by Predicting Dominant Keywords.Biomed Res Int. 2015;2015:928531. doi: 10.1155/2015/928531. Epub 2015 Dec 10. Biomed Res Int. 2015. PMID: 26783534 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources