Skip to main page content Skip to main page content

DNorm: Disease Named Entity Recognition and Normalization with Pairwise Learning to Rank

Authors: Robert Leaman, Rezarta Islamaj Dogan and Zhiyong Lu (PI)

Research highlights (demo)

DNorm is an automated method for determining which diseases are mentioned in biomedical text, the task of disease normalization. Diseases have a central role in many lines of biomedical research, making this task important for many lines of inquiry, including etiology (e.g. gene-disease relationships) and clinical aspects (e.g. diagnosis, prevention, and treatment). DNorm is a high-performing and mathematically principled framework for learning similarities between mentions and concept names directly from training data. DNorm is the first technique to use machine learning to normalize disease names and also the first method employing pairwise learning to rank in a normalization task. DNorm achieved the best performance in the 2013 ShARe/CLEF shared task on disease normalization in clinical notes.

Method overview

The technique consists of series of processing steps summarized in Figure 1 and described below.

Figure 1. Processing pipeline diagram.

Results

We evaluated the system on the NCBI Disease Corpus test set at the level of associations between the disease concept and the abstract, not individual mentions.

Method Precision Recall F-measure
NLM Lexical Normalization 0.218 0.685 0.331
MetaMap 0.502 0.665 0.572
Inference Method 0.533 0.662 0.591
BANNER + Lucene 0.612 0.647 0.629
BANNER + cosine similarity 0.649 0.674 0.661
DNorm (BANNER + pLTR) 0.803 0.763 0.782
Table 1. Evaluation of DNorm against several baseline techniques, using micro-averaged precision, recall and F-measure.

Downloads

Please cite