G-Bean: an ontology-graph based web tool for biomedical literature retrieval
- PMID: 25474588
- PMCID: PMC4243180
- DOI: 10.1186/1471-2105-15-S12-S1
G-Bean: an ontology-graph based web tool for biomedical literature retrieval
Abstract
Background: Currently, most people use NCBI's PubMed to search the MEDLINE database, an important bibliographical information source for life science and biomedical information. However, PubMed has some drawbacks that make it difficult to find relevant publications pertaining to users' individual intentions, especially for non-expert users. To ameliorate the disadvantages of PubMed, we developed G-Bean, a graph based biomedical search engine, to search biomedical articles in MEDLINE database more efficiently.
Methods: G-Bean addresses PubMed's limitations with three innovations: (1) Parallel document index creation: a multithreaded index creation strategy is employed to generate the document index for G-Bean in parallel; (2) Ontology-graph based query expansion: an ontology graph is constructed by merging four major UMLS (Version 2013AA) vocabularies, MeSH, SNOMEDCT, CSP and AOD, to cover all concepts in National Library of Medicine (NLM) database; a Personalized PageRank algorithm is used to compute concept relevance in this ontology graph and the Term Frequency - Inverse Document Frequency (TF-IDF) weighting scheme is used to re-rank the concepts. The top 500 ranked concepts are selected for expanding the initial query to retrieve more accurate and relevant information; (3) Retrieval and re-ranking of documents based on user's search intention: after the user selects any article from the existing search results, G-Bean analyzes user's selections to determine his/her true search intention and then uses more relevant and more specific terms to retrieve additional related articles. The new articles are presented to the user in the order of their relevance to the already selected articles.
Results: Performance evaluation with 106 OHSUMED benchmark queries shows that G-Bean returns more relevant results than PubMed does when using these queries to search the MEDLINE database. PubMed could not even return any search result for some OHSUMED queries because it failed to form the appropriate Boolean query statement automatically from the natural language query strings. G-Bean is available at http://bioinformatics.clemson.edu/G-Bean/index.php.
Conclusions: G-Bean addresses PubMed's limitations with ontology-graph based query expansion, automatic document indexing, and user search intention discovery. It shows significant advantages in finding relevant articles from the MEDLINE database to meet the information need of the user.
Figures
Similar articles
-
User centered and ontology based information retrieval system for life sciences.BMC Bioinformatics. 2012 Jan 25;13 Suppl 1(Suppl 1):S4. doi: 10.1186/1471-2105-13-S1-S4. BMC Bioinformatics. 2012. PMID: 22373375 Free PMC article.
-
Textpresso: an ontology-based information retrieval and extraction system for biological literature.PLoS Biol. 2004 Nov;2(11):e309. doi: 10.1371/journal.pbio.0020309. Epub 2004 Sep 21. PLoS Biol. 2004. PMID: 15383839 Free PMC article.
-
OvidSP Medline-to-PubMed search filter translation: a methodology for extending search filter range to include PubMed's unique content.BMC Med Res Methodol. 2013 Jul 2;13:86. doi: 10.1186/1471-2288-13-86. BMC Med Res Methodol. 2013. PMID: 23819658 Free PMC article.
-
Where to search top-K biomedical ontologies?Brief Bioinform. 2019 Jul 19;20(4):1477-1491. doi: 10.1093/bib/bby015. Brief Bioinform. 2019. PMID: 29579141 Free PMC article. Review.
-
HEALTH GeoJunction: place-time-concept browsing of health publications.Int J Health Geogr. 2010 May 18;9:23. doi: 10.1186/1476-072X-9-23. Int J Health Geogr. 2010. PMID: 20482806 Free PMC article. Review.
Cited by
-
PaperBot: open-source web-based search and metadata organization of scientific literature.BMC Bioinformatics. 2019 Jan 24;20(1):50. doi: 10.1186/s12859-019-2613-z. BMC Bioinformatics. 2019. PMID: 30678631 Free PMC article.
-
MedGraph: A semantic biomedical information retrieval framework using knowledge graph embedding for PubMed.Front Big Data. 2022 Oct 19;5:965619. doi: 10.3389/fdata.2022.965619. eCollection 2022. Front Big Data. 2022. PMID: 36338335 Free PMC article.
-
Triage of documents containing protein interactions affected by mutations using an NLP based machine learning approach.BMC Genomics. 2020 Nov 10;21(1):773. doi: 10.1186/s12864-020-07185-7. BMC Genomics. 2020. PMID: 33167858 Free PMC article.
-
Searching COVID-19 Clinical Research Using Graph Queries: Algorithm Development and Validation.J Med Internet Res. 2024 May 30;26:e52655. doi: 10.2196/52655. J Med Internet Res. 2024. PMID: 38814687 Free PMC article.
References
-
- Hersh WR. Information retrieval: a health and biomedical perspective. New York, NY: Springer; 2009.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous