Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006:2006:6-10.

Prospective validation of text categorization filters for identifying high-quality, content-specific articles in MEDLINE

Affiliations

Prospective validation of text categorization filters for identifying high-quality, content-specific articles in MEDLINE

Yindalon Aphinyanaphongs et al. AMIA Annu Symp Proc. 2006.

Abstract

In prior work, we introduced a machine learning method to identify high quality MEDLINE documents in internal medicine. The performance of the original filter models built with this corpus on years outside 1998-2000 was not assessed directly. Validating the performance of the original filter models on current corpora is crucial to validate them for use in current years, to verify that the model fitting and model error estimation procedures do not over-fit the models, and to validate consistency of the chosen ACPJ gold standard (i.e., that ACPJ editorial policies and criteria are stable over time). Our prospective validation results indicated that in the categories of treatment, etiology, diagnosis, and prognosis, the original machine learning filter models built from the 1998-2000 corpora maintained their discriminatory performance of 0.97, 0.97, 0.94, and 0.94 area under the curve in each respective category when applied to a 2005 corpus. The ACPJ is a stable, reliable gold standard and the machine learning methodology provides robust models and model performance estimates. Machine learning filter models built with 1998-2000 corpora can be applied to identify high quality articles in recent years.

PubMed Disclaimer

Similar articles

Cited by

References

    1. Aphinyanaphongs Y, Tsamardinos I, Statnikov A, Hardin D, Aliferis CF. Text Categorization Models for High Quality Article Retrieval in Internal Medicine. J Amer Med Inform Assoc. 2005;12(2):207–216. - PMC - PubMed
    1. Wilczynski N, Haynes B. Proc AMIA Symposium. Washington DC: 2003. Developing Optimal Search Strategies for Detecting Clinically Sound Causation Studies in MEDLINE; pp. 719–23. - PMC - PubMed
    1. Wilczynski N, Haynes B. Optimal Search Strategies for Detecting Clinically Sound Prognostic Studies in EMBASE. J Amer Med Inform Assoc. 2005 Jul–Aug;12(4):481–485. - PMC - PubMed
    1. Haynes B, Wilczynski N. Optimal Search Strategies for retrieving scientifically strong studies of diagnosis from MEDLINE: an analytical survery. BMJ. 2004 - PMC - PubMed
    1. Wilczynski N, Haynes B. Robustness of Empirical Search Strategies for Clinical Content. AMIA. 2002 - PMC - PubMed

Publication types

LinkOut - more resources