A comparison of citation metrics to machine learning filters for the identification of high quality MEDLINE documents
- PMID: 16622165
- PMCID: PMC1513679
- DOI: 10.1197/jamia.M2031
A comparison of citation metrics to machine learning filters for the identification of high quality MEDLINE documents
Abstract
Objective: The present study explores the discriminatory performance of existing and novel gold-standard-specific machine learning (GSS-ML) focused filter models (i.e., models built specifically for a retrieval task and a gold standard against which they are evaluated) and compares their performance to citation count and impact factors, and non-specific machine learning (NS-ML) models (i.e., models built for a different task and/or different gold standard).
Design: Three gold standard corpora were constructed using the SSOAB bibliography, the ACPJ-cited treatment articles, and the ACPJ-cited etiology articles. Citation counts and impact factors were obtained for each article. Support vector machine models were used to classify the articles using combinations of content, impact factors, and citation counts as predictors.
Measurements: Discriminatory performance was estimated using the area under the receiver operating characteristic curve and n-fold cross-validation.
Results: For all three gold standards and tasks, GSS-ML filters outperformed citation count, impact factors, and NS-ML filters. Combinations of content with impact factor or citation count produced no or negligible improvements to the GSS machine learning filters.
Conclusions: These experiments provide evidence that when building information retrieval filters focused on a retrieval task and corresponding gold standard, the filter models have to be built specifically for this task and gold standard. Under those conditions, machine learning filters outperform standard citation metrics. Furthermore, citation counts and impact factors add marginal value to discriminatory performance. Previous research that claimed better performance of citation metrics than machine learning in one of the corpora examined here is attributed to using machine learning filters built for a different gold standard and task.
Figures


Similar articles
-
Prospective validation of text categorization filters for identifying high-quality, content-specific articles in MEDLINE.AMIA Annu Symp Proc. 2006;2006:6-10. AMIA Annu Symp Proc. 2006. PMID: 17238292 Free PMC article.
-
Text categorization models for high-quality article retrieval in internal medicine.J Am Med Inform Assoc. 2005 Mar-Apr;12(2):207-16. doi: 10.1197/jamia.M1641. Epub 2004 Nov 23. J Am Med Inform Assoc. 2005. PMID: 15561789 Free PMC article.
-
Using citation data to improve retrieval from MEDLINE.J Am Med Inform Assoc. 2006 Jan-Feb;13(1):96-105. doi: 10.1197/jamia.M1909. Epub 2005 Oct 12. J Am Med Inform Assoc. 2006. PMID: 16221938 Free PMC article.
-
Search strategies to identify diagnostic accuracy studies in MEDLINE and EMBASE.Cochrane Database Syst Rev. 2013 Sep 11;2013(9):MR000022. doi: 10.1002/14651858.MR000022.pub3. Cochrane Database Syst Rev. 2013. PMID: 24022476 Free PMC article. Review.
-
High-performance information search filters for acute kidney injury content in PubMed, Ovid Medline and Embase.Nephrol Dial Transplant. 2014 Apr;29(4):823-32. doi: 10.1093/ndt/gft531. Epub 2014 Jan 20. Nephrol Dial Transplant. 2014. PMID: 24449104 Review.
Cited by
-
Search filter precision can be improved by NOTing out irrelevant content.AMIA Annu Symp Proc. 2011;2011:1506-13. Epub 2011 Oct 22. AMIA Annu Symp Proc. 2011. PMID: 22195215 Free PMC article.
-
Physicians' perception of alternative displays of clinical research evidence for clinical decision support - A study with case vignettes.J Biomed Inform. 2017 Jul;71S:S53-S59. doi: 10.1016/j.jbi.2017.01.007. Epub 2017 Jan 13. J Biomed Inform. 2017. PMID: 28089913 Free PMC article.
-
Sequential result refinement for searching the biomedical literature.J Biomed Inform. 2009 Aug;42(4):678-84. doi: 10.1016/j.jbi.2009.02.009. Epub 2009 Mar 9. J Biomed Inform. 2009. PMID: 19272463 Free PMC article.
-
Automatic identification of high impact articles in PubMed to support clinical decision making.J Biomed Inform. 2017 Sep;73:95-103. doi: 10.1016/j.jbi.2017.07.015. Epub 2017 Jul 26. J Biomed Inform. 2017. PMID: 28756159 Free PMC article.
-
Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine.J Am Med Inform Assoc. 2015 May;22(3):707-17. doi: 10.1093/jamia/ocu025. Epub 2015 Feb 5. J Am Med Inform Assoc. 2015. PMID: 25656516 Free PMC article.
References
-
- Garfield E. The Meaning of the Impact Factor International Journal of Clinical and Health Psychology 2003;3:363-369.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous