Prospective validation of text categorization filters for identifying high-quality, content-specific articles in MEDLINE
- PMID: 17238292
- PMCID: PMC1839419
Prospective validation of text categorization filters for identifying high-quality, content-specific articles in MEDLINE
Abstract
In prior work, we introduced a machine learning method to identify high quality MEDLINE documents in internal medicine. The performance of the original filter models built with this corpus on years outside 1998-2000 was not assessed directly. Validating the performance of the original filter models on current corpora is crucial to validate them for use in current years, to verify that the model fitting and model error estimation procedures do not over-fit the models, and to validate consistency of the chosen ACPJ gold standard (i.e., that ACPJ editorial policies and criteria are stable over time). Our prospective validation results indicated that in the categories of treatment, etiology, diagnosis, and prognosis, the original machine learning filter models built from the 1998-2000 corpora maintained their discriminatory performance of 0.97, 0.97, 0.94, and 0.94 area under the curve in each respective category when applied to a 2005 corpus. The ACPJ is a stable, reliable gold standard and the machine learning methodology provides robust models and model performance estimates. Machine learning filter models built with 1998-2000 corpora can be applied to identify high quality articles in recent years.
Similar articles
-
A comparison of citation metrics to machine learning filters for the identification of high quality MEDLINE documents.J Am Med Inform Assoc. 2006 Jul-Aug;13(4):446-55. doi: 10.1197/jamia.M2031. Epub 2006 Apr 18. J Am Med Inform Assoc. 2006. PMID: 16622165 Free PMC article.
-
Text categorization models for high-quality article retrieval in internal medicine.J Am Med Inform Assoc. 2005 Mar-Apr;12(2):207-16. doi: 10.1197/jamia.M1641. Epub 2004 Nov 23. J Am Med Inform Assoc. 2005. PMID: 15561789 Free PMC article.
-
Text categorization models for retrieval of high quality articles in internal medicine.AMIA Annu Symp Proc. 2003;2003:31-5. AMIA Annu Symp Proc. 2003. PMID: 14728128 Free PMC article.
-
High-performance information search filters for acute kidney injury content in PubMed, Ovid Medline and Embase.Nephrol Dial Transplant. 2014 Apr;29(4):823-32. doi: 10.1093/ndt/gft531. Epub 2014 Jan 20. Nephrol Dial Transplant. 2014. PMID: 24449104 Review.
-
Searching the MEDLINE literature database through PubMed: a short guide.Onkologie. 2005 Oct;28(10):517-22. doi: 10.1159/000087186. Epub 2005 Aug 19. Onkologie. 2005. PMID: 16186693 Review.
Cited by
-
Boosting efficiency in a clinical literature surveillance system with LightGBM.PLOS Digit Health. 2024 Sep 23;3(9):e0000299. doi: 10.1371/journal.pdig.0000299. eCollection 2024 Sep. PLOS Digit Health. 2024. PMID: 39312500 Free PMC article.
-
Classifying publications from the clinical and translational science award program along the translational research spectrum: a machine learning approach.J Transl Med. 2016 Aug 5;14(1):235. doi: 10.1186/s12967-016-0992-8. J Transl Med. 2016. PMID: 27492440 Free PMC article.
-
A Deep Learning Approach to Refine the Identification of High-Quality Clinical Research Articles From the Biomedical Literature: Protocol for Algorithm Development and Validation.JMIR Res Protoc. 2021 Nov 29;10(11):e29398. doi: 10.2196/29398. JMIR Res Protoc. 2021. PMID: 34847061 Free PMC article.
-
Machine Learning Approaches to Retrieve High-Quality, Clinically Relevant Evidence From the Biomedical Literature: Systematic Review.JMIR Med Inform. 2021 Sep 9;9(9):e30401. doi: 10.2196/30401. JMIR Med Inform. 2021. PMID: 34499041 Free PMC article. Review.
References
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources