Identification of the Best Semantic Expansion to Query PubMed Through Automatic Performance Assessment of Four Search Strategies on All Medical Subject Heading Descriptors: Comparative Study
- PMID: 32496201
- PMCID: PMC7303830
- DOI: 10.2196/12799
Identification of the Best Semantic Expansion to Query PubMed Through Automatic Performance Assessment of Four Search Strategies on All Medical Subject Heading Descriptors: Comparative Study
Abstract
Background: With the continuous expansion of available biomedical data, efficient and effective information retrieval has become of utmost importance. Semantic expansion of queries using synonyms may improve information retrieval.
Objective: The aim of this study was to automatically construct and evaluate expanded PubMed queries of the form "preferred term"[MH] OR "preferred term"[TIAB] OR "synonym 1"[TIAB] OR "synonym 2"[TIAB] OR …, for each of the 28,313 Medical Subject Heading (MeSH) descriptors, by using different semantic expansion strategies. We sought to propose an innovative method that could automatically evaluate these strategies, based on the three main metrics used in information science (precision, recall, and F-measure).
Methods: Three semantic expansion strategies were assessed. They differed by the synonyms used to build the queries as follows: MeSH synonyms, Unified Medical Language System (UMLS) mappings, and custom mappings (Catalogue et Index des Sites Médicaux de langue Française [CISMeF]). The precision, recall, and F-measure metrics were automatically computed for the three strategies and for the standard automatic term mapping (ATM) of PubMed. The method to automatically compute the metrics involved computing the number of all relevant citations (A), using National Library of Medicine indexing as the gold standard ("preferred term"[MH]), the number of citations retrieved by the added terms ("synonym 1"[TIAB] OR "synonym 2"[TIAB] OR …) (B), and the number of relevant citations retrieved by the added terms (combining the previous two queries with an "AND" operator) (C). It was possible to programmatically compute the metrics for each strategy using each of the 28,313 MeSH descriptors as a "preferred term," corresponding to 239,724 different queries built and sent to the PubMed application program interface. The four search strategies were ranked and compared for each metric.
Results: ATM had the worst performance for all three metrics among the four strategies. The MeSH strategy had the best mean precision (51%, SD 23%). The UMLS strategy had the best recall and F-measure (41%, SD 31% and 36%, SD 24%, respectively). CISMeF had the second best recall and F-measure (40%, SD 31% and 35%, SD 24%, respectively). However, considering a cutoff of 5%, CISMeF had better precision than UMLS for 1180 descriptors, better recall for 793 descriptors, and better F-measure for 678 descriptors.
Conclusions: This study highlights the importance of using semantic expansion strategies to improve information retrieval. However, the performances of a given strategy, relatively to another, varied greatly depending on the MeSH descriptor. These results confirm there is no ideal search strategy for all descriptors. Different semantic expansions should be used depending on the descriptor and the user's objectives. Thus, we developed an interface that allows users to input a descriptor and then proposes the best semantic expansion to maximize the three main metrics (precision, recall, and F-measure).
Keywords: MEDLINE; Medical Subject Headings; PubMed; bibliographic database; information retrieval; literature search; precision; recall; search strategy; thesaurus.
©Clément R Massonnaud, Gaétan Kerdelhué, Julien Grosjean, Romain Lelong, Nicolas Griffon, Stefan J Darmoni. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 04.06.2020.
Conflict of interest statement
Conflicts of Interest: None declared.
Figures


Similar articles
-
Performance evaluation of three semantic expansions to query PubMed.Health Info Libr J. 2021 Jun;38(2):113-124. doi: 10.1111/hir.12291. Epub 2019 Dec 14. Health Info Libr J. 2021. PMID: 31837099
-
G-Bean: an ontology-graph based web tool for biomedical literature retrieval.BMC Bioinformatics. 2014;15 Suppl 12(Suppl 12):S1. doi: 10.1186/1471-2105-15-S12-S1. Epub 2014 Nov 6. BMC Bioinformatics. 2014. PMID: 25474588 Free PMC article.
-
Improving information retrieval using Medical Subject Headings Concepts: a test case on rare and chronic diseases.J Med Libr Assoc. 2012 Jul;100(3):176-83. doi: 10.3163/1536-5050.100.3.007. J Med Libr Assoc. 2012. PMID: 22879806 Free PMC article.
-
Sensitivity and predictive value of 15 PubMed search strategies to answer clinical questions rated against full systematic reviews.J Med Internet Res. 2012 Jun 12;14(3):e85. doi: 10.2196/jmir.2021. J Med Internet Res. 2012. PMID: 22693047 Free PMC article. Review.
-
Orthopaedic literature and MeSH.Clin Orthop Relat Res. 2010 Oct;468(10):2621-6. doi: 10.1007/s11999-010-1387-4. Clin Orthop Relat Res. 2010. PMID: 20623263 Free PMC article. Review.
Cited by
-
Representation of Social Determinants of Health terminology in medical subject headings: impact of added terms.J Am Med Inform Assoc. 2024 Nov 1;31(11):2595-2604. doi: 10.1093/jamia/ocae191. J Am Med Inform Assoc. 2024. PMID: 39047296 Free PMC article.
References
-
- National Library of Medicine. [2018-10-29]. Yearly Citation Totals from 2017 MEDLINE/PubMed Baseline: 26,759,399 Citations Found https://www.nlm.nih.gov/bsd/licensee/2017_stats/2017_Totals.html .
-
- National Library of Medicine. [2018-10-29]. Key MEDLINE® Indicators https://www.nlm.nih.gov/bsd/bsd_key.html .
-
- Zwolsman S, te Pas E, Hooft L, Wieringa-de Waard M, van Dijk N. Barriers to GPs' use of evidence-based medicine: a systematic review. Br J Gen Pract. 2012 Jul;62(600):e511–21. doi: 10.3399/bjgp12X652382. https://bjgp.org/cgi/pmidlookup?view=long&pmid=22781999 - DOI - PMC - PubMed
-
- Majid S, Foo S, Luyt B, Zhang X, Theng Y, Chang Y, Mokhtar IA. Adopting evidence-based practice in clinical decision making: nurses' perceptions, knowledge, and barriers. J Med Libr Assoc. 2011 Jul;99(3):229–36. doi: 10.3163/1536-5050.99.3.010. http://europepmc.org/abstract/MED/21753915 - DOI - PMC - PubMed
LinkOut - more resources
Full Text Sources
Research Materials
Miscellaneous