Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014;15 Suppl 12(Suppl 12):S1.
doi: 10.1186/1471-2105-15-S12-S1. Epub 2014 Nov 6.

G-Bean: an ontology-graph based web tool for biomedical literature retrieval

G-Bean: an ontology-graph based web tool for biomedical literature retrieval

James Z Wang et al. BMC Bioinformatics. 2014.

Abstract

Background: Currently, most people use NCBI's PubMed to search the MEDLINE database, an important bibliographical information source for life science and biomedical information. However, PubMed has some drawbacks that make it difficult to find relevant publications pertaining to users' individual intentions, especially for non-expert users. To ameliorate the disadvantages of PubMed, we developed G-Bean, a graph based biomedical search engine, to search biomedical articles in MEDLINE database more efficiently.

Methods: G-Bean addresses PubMed's limitations with three innovations: (1) Parallel document index creation: a multithreaded index creation strategy is employed to generate the document index for G-Bean in parallel; (2) Ontology-graph based query expansion: an ontology graph is constructed by merging four major UMLS (Version 2013AA) vocabularies, MeSH, SNOMEDCT, CSP and AOD, to cover all concepts in National Library of Medicine (NLM) database; a Personalized PageRank algorithm is used to compute concept relevance in this ontology graph and the Term Frequency - Inverse Document Frequency (TF-IDF) weighting scheme is used to re-rank the concepts. The top 500 ranked concepts are selected for expanding the initial query to retrieve more accurate and relevant information; (3) Retrieval and re-ranking of documents based on user's search intention: after the user selects any article from the existing search results, G-Bean analyzes user's selections to determine his/her true search intention and then uses more relevant and more specific terms to retrieve additional related articles. The new articles are presented to the user in the order of their relevance to the already selected articles.

Results: Performance evaluation with 106 OHSUMED benchmark queries shows that G-Bean returns more relevant results than PubMed does when using these queries to search the MEDLINE database. PubMed could not even return any search result for some OHSUMED queries because it failed to form the appropriate Boolean query statement automatically from the natural language query strings. G-Bean is available at http://bioinformatics.clemson.edu/G-Bean/index.php.

Conclusions: G-Bean addresses PubMed's limitations with ontology-graph based query expansion, automatic document indexing, and user search intention discovery. It shows significant advantages in finding relevant articles from the MEDLINE database to meet the information need of the user.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Major steps of search process in G-Bean.
Figure 2
Figure 2
The architecture of G-Bean.
Figure 3
Figure 3
Screenshot of the user interface of G-Bean.
Figure 4
Figure 4
Major components of the G-Bean server.

Similar articles

Cited by

References

    1. MEDLINE. http://www.nlm.nih.gov/bsd/pmresources.html
    1. Islamaj DI, Murray GC, Névéol A, Lu Z. Understanding PubMed® user search behaviour through log analysis. Database (Oxford) 2009;2009:bap018. - PMC - PubMed
    1. PubMed. http://www.ncbi.nlm.nih.gov/pubmed
    1. Hersh WR. Information retrieval: a health and biomedical perspective. New York, NY: Springer; 2009.
    1. Bernstam E. In: Proceedings of Amia Symposium. Suzanne Bakken, editor. 2001. MedlineQBE (Query-by-Example) pp. 47–51. - PMC - PubMed

Publication types

LinkOut - more resources