Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jan;103(1):22-30.
doi: 10.3163/1536-5050.103.1.005.

Building a gold standard to construct search filters: a case study with biomarkers for oral cancer

Affiliations

Building a gold standard to construct search filters: a case study with biomarkers for oral cancer

John J Frazier et al. J Med Libr Assoc. 2015 Jan.

Abstract

Objective: To support clinical researchers, librarians and informationists may need search filters for particular tasks. Development of filters typically depends on a "gold standard" dataset. This paper describes generalizable methods for creating a gold standard to support future filter development and evaluation using oral squamous cell carcinoma (OSCC) as a case study. OSCC is the most common malignancy affecting the oral cavity. Investigation of biomarkers with potential prognostic utility is an active area of research in OSCC. The methods discussed here should be useful for designing quality search filters in similar domains.

Methods: The authors searched MEDLINE for prognostic studies of OSCC, developed annotation guidelines for screeners, ran three calibration trials before annotating the remaining body of citations, and measured inter-annotator agreement (IAA).

Results: We retrieved 1,818 citations. After calibration, we screened the remaining citations (n = 1,767; 97.2%); IAA was substantial (kappa = 0.76). The dataset has 497 (27.3%) citations representing OSCC studies of potential prognostic biomarkers.

Conclusions: The gold standard dataset is likely to be high quality and useful for future development and evaluation of filters for OSCC studies of potential prognostic biomarkers.

Implications: The methodology we used is generalizable to other domains requiring a reference standard to evaluate the performance of search filters. A gold standard is essential because the labels regarding relevance enable computation of diagnostic metrics, such as sensitivity and specificity. Librarians and informationists with data analysis skills could contribute to developing gold standard datasets and subsequent filters tuned for their patrons' domains of interest.

PubMed Disclaimer

Figures

Figure 1
Figure 1
1 Flowchart for development of a gold standard dataset IAA = inter-annotator agreement.
Figure 2
Figure 2
2 Interface for the Gastrointestinal Annotation Tool, now referred to as the General Information Annotation Tool For this task, a “patient report” is a portion of the complete citation for a scientific article; keywords are National Library of Medicine Medical Subject Headings assigned by PubMed indexers (not displayed).

Similar articles

Cited by

References

    1. National Center for Biotechnology Information. PubMed.gov [Internet] US National Library of Medicine, National Institutes of Health [cited 17 Feb 2014]; < http://www.ncbi.nlm.nih.gov/pubmed/>.
    1. US National Library of Medicine. Fact sheet: MEDLINE [Internet] Bethesda, MD: US National Institutes of Health [rev. 7 May 2014; cited 4 Jun 2014]; < http://www.nlm.nih.gov/pubs/factsheets/medline.html>.
    1. Boissier MC. Benchmarking biomedical publications worldwide. Rheumatology (Oxford) 2013 Sep;52(9):1545–6. - PubMed
    1. Wilczynski NL, Haynes RB. Developing optimal search strategies for detecting clinically sound prognostic studies in MEDLINE: an analytic survey. BMC Med. 2004 Jun 9;2:23. - PMC - PubMed
    1. Haynes RB, McKibbon KA, Wilczynski NL, Walter SD, Werre SR. Optimal search strategies for retrieving scientifically strong studies of treatment from Medline: analytical survey. BMJ. 2005 May 21;330(7501):1179. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources