Deep learning to refine the identification of high-quality clinical research articles from the biomedical literature: Performance evaluation
- PMID: 37164244
- DOI: 10.1016/j.jbi.2023.104384
Deep learning to refine the identification of high-quality clinical research articles from the biomedical literature: Performance evaluation
Abstract
Background: Identifying practice-ready evidence-based journal articles in medicine is a challenge due to the sheer volume of biomedical research publications. Newer approaches to support evidence discovery apply deep learning techniques to improve the efficiency and accuracy of classifying sound evidence.
Objective: To determine how well deep learning models using variants of Bidirectional Encoder Representations from Transformers (BERT) identify high-quality evidence with high clinical relevance from the biomedical literature for consideration in clinical practice.
Methods: We fine-tuned variations of BERT models (BERTBASE, BioBERT, BlueBERT, and PubMedBERT) and compared their performance in classifying articles based on methodological quality criteria. The dataset used for fine-tuning models included titles and abstracts of >160,000 PubMed records from 2012 to 2020 that were of interest to human health which had been manually labeled based on meeting established critical appraisal criteria for methodological rigor. The data was randomly divided into 80:10:10 sets for training, validating, and testing. In addition to using the full unbalanced set, the training data was randomly undersampled into four balanced datasets to assess performance and select the best performing model. For each of the four sets, one model that maintained sensitivity (recall) at ≥99% was selected and were ensembled. The best performing model was evaluated in a prospective, blinded test and applied to an established reference standard, the Clinical Hedges dataset.
Results: In training, three of the four selected best performing models were trained using BioBERTBASE. The ensembled model did not boost performance compared with the best individual model. Hence a solo BioBERT-based model (named DL-PLUS) was selected for further testing as it was computationally more efficient. The model had high recall (>99%) and 60% to 77% specificity in a prospective evaluation conducted with blinded research associates and saved >60% of the work required to identify high quality articles.
Conclusions: Deep learning using pretrained language models and a large dataset of classified articles produced models with improved specificity while maintaining >99% recall. The resulting DL-PLUS model identifies high-quality, clinically relevant articles from PubMed at the time of publication. The model improves the efficiency of a literature surveillance program, which allows for faster dissemination of appraised research.
Keywords: Bioinformatics; Evidence-based medicine; Literature retrieval; Machine learning; Medical informatics; Natural Language Processing.
Copyright © 2023 The Author(s). Published by Elsevier Inc. All rights reserved.
Conflict of interest statement
Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Similar articles
-
A Deep Learning Approach to Refine the Identification of High-Quality Clinical Research Articles From the Biomedical Literature: Protocol for Algorithm Development and Validation.JMIR Res Protoc. 2021 Nov 29;10(11):e29398. doi: 10.2196/29398. JMIR Res Protoc. 2021. PMID: 34847061 Free PMC article.
-
Classifying the lifestyle status for Alzheimer's disease from clinical notes using deep learning with weak supervision.BMC Med Inform Decis Mak. 2022 Jul 7;22(Suppl 1):88. doi: 10.1186/s12911-022-01819-4. BMC Med Inform Decis Mak. 2022. PMID: 35799294 Free PMC article.
-
Bioformer: an efficient transformer language model for biomedical text mining.ArXiv [Preprint]. 2023 Feb 3:arXiv:2302.01588v1. ArXiv. 2023. PMID: 36945685 Free PMC article. Preprint.
-
Artificial intelligence-powered pharmacovigilance: A review of machine and deep learning in clinical text-based adverse drug event detection for benchmark datasets.J Biomed Inform. 2024 Apr;152:104621. doi: 10.1016/j.jbi.2024.104621. Epub 2024 Mar 5. J Biomed Inform. 2024. PMID: 38447600 Review.
-
AMMU: A survey of transformer-based biomedical pretrained language models.J Biomed Inform. 2022 Feb;126:103982. doi: 10.1016/j.jbi.2021.103982. Epub 2021 Dec 31. J Biomed Inform. 2022. PMID: 34974190 Review.
Cited by
-
The McMaster Health Information Research Unit: Over a Quarter-Century of Health Informatics Supporting Evidence-Based Medicine.J Med Internet Res. 2024 Jul 31;26:e58764. doi: 10.2196/58764. J Med Internet Res. 2024. PMID: 39083765 Free PMC article.
-
Automated Category and Trend Analysis of Scientific Articles on Ophthalmology Using Large Language Models: Development and Usability Study.JMIR Form Res. 2024 Mar 22;8:e52462. doi: 10.2196/52462. JMIR Form Res. 2024. PMID: 38517457 Free PMC article.
-
Boosting efficiency in a clinical literature surveillance system with LightGBM.PLOS Digit Health. 2024 Sep 23;3(9):e0000299. doi: 10.1371/journal.pdig.0000299. eCollection 2024 Sep. PLOS Digit Health. 2024. PMID: 39312500 Free PMC article.
-
Large language models in radiology: fundamentals, applications, ethical considerations, risks, and future directions.Diagn Interv Radiol. 2024 Mar 6;30(2):80-90. doi: 10.4274/dir.2023.232417. Epub 2023 Oct 3. Diagn Interv Radiol. 2024. PMID: 37789676 Free PMC article. Review.
MeSH terms
LinkOut - more resources
Full Text Sources