Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2006 Jul-Aug;13(4):446-55.
doi: 10.1197/jamia.M2031. Epub 2006 Apr 18.

A comparison of citation metrics to machine learning filters for the identification of high quality MEDLINE documents

Affiliations
Comparative Study

A comparison of citation metrics to machine learning filters for the identification of high quality MEDLINE documents

Yindalon Aphinyanaphongs et al. J Am Med Inform Assoc. 2006 Jul-Aug.

Abstract

Objective: The present study explores the discriminatory performance of existing and novel gold-standard-specific machine learning (GSS-ML) focused filter models (i.e., models built specifically for a retrieval task and a gold standard against which they are evaluated) and compares their performance to citation count and impact factors, and non-specific machine learning (NS-ML) models (i.e., models built for a different task and/or different gold standard).

Design: Three gold standard corpora were constructed using the SSOAB bibliography, the ACPJ-cited treatment articles, and the ACPJ-cited etiology articles. Citation counts and impact factors were obtained for each article. Support vector machine models were used to classify the articles using combinations of content, impact factors, and citation counts as predictors.

Measurements: Discriminatory performance was estimated using the area under the receiver operating characteristic curve and n-fold cross-validation.

Results: For all three gold standards and tasks, GSS-ML filters outperformed citation count, impact factors, and NS-ML filters. Combinations of content with impact factor or citation count produced no or negligible improvements to the GSS machine learning filters.

Conclusions: These experiments provide evidence that when building information retrieval filters focused on a retrieval task and corresponding gold standard, the filter models have to be built specifically for this task and gold standard. Under those conditions, machine learning filters outperform standard citation metrics. Furthermore, citation counts and impact factors add marginal value to discriminatory performance. Previous research that claimed better performance of citation metrics than machine learning in one of the corpora examined here is attributed to using machine learning filters built for a different gold standard and task.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Average HITS curves used on SSOAB corpus. The GSS SSOAB model returns the most true positive documents in the first 150 articles. Citation count and the NS ACPJ Treatment Model applied to the SSOAB corpus return fewer true positive documents in the top 150 returns. The SSOAB corpus was composed of 431 positives and 7,379 negatives.
Figure 2
Figure 2
Average precision-recall curves used on SSOAB corpus. The GSS SSOAB model returns the best performing precision-recall curve. Citation count and the NS ACPJ Treatment Model have curves below, thus performing lower than, the GSS SSOAB model. The SSOAB corpus was composed of 431 positives and 7,379 negatives.

Similar articles

Cited by

References

    1. Aphinyanaphongs Y, Tsamardinos I, Statnikov A, Hardin D, Aliferis CF. Text Categorization Models for High Quality Article Retrieval in Internal Medicine J Amer Med Inform Assoc 2005;12:207-216. - PMC - PubMed
    1. Haynes B, Wilczynski N, McKibbon KA, Walker CJ, Sinclair JC. Developing Optimal Search Strategies for Detecting Sound Clinical Studies in MEDLINE J Amer Med Inform Assoc 1994;1:447-458. - PMC - PubMed
    1. Wilczynski N, Haynes B. Optimal Search Strategies for Detecting Clinically Sounds Prognostic Studies in EMBASE J Amer Med Inform Assoc 2005;12:481-485. - PMC - PubMed
    1. Duda S, Aliferis CF, Miller RA, Statnikov A, Johnson KB. Extracting Drug-Drug Interaction Articles from MEDLINE to Improve the Content of Drug Databases. In: AMIA Symposium; 2005; Washington, D.C.. - PMC - PubMed
    1. Garfield E. The Meaning of the Impact Factor International Journal of Clinical and Health Psychology 2003;3:363-369.

Publication types