DNorm: Disease Named Entity Recognition and Normalization

Authors: Robert Leaman, Rezarta Islamaj Dogan and Zhiyong Lu (PI)

Research highlights (demo)

DNorm is an automated method for determining which diseases are mentioned in biomedical text, the task of disease normalization. Diseases have a central role in many lines of biomedical research, making this task important for many lines of inquiry, including etiology (e.g. gene-disease relationships) and clinical aspects (e.g. diagnosis, prevention, and treatment). DNorm is a high-performing and mathematically principled framework for learning similarities between mentions and concept names directly from training data. DNorm is the first technique to use machine learning to normalize disease names and also the first method employing pairwise learning to rank in a normalization task. DNorm achieved the best performance in the 2013 ShARe/CLEF shared task on disease normalization in clinical notes.

Method overview

The technique consists of series of processing steps summarized in Figure 1 and described below.

Figure 1. Processing pipeline diagram.

Results

We evaluated the system on the NCBI Disease Corpus test set at the level of associations between the disease concept and the abstract, not individual mentions.

Method	Precision	Recall	F-measure
NLM Lexical Normalization	0.218	0.685	0.331
MetaMap	0.502	0.665	0.572
Inference Method	0.533	0.662	0.591
BANNER + Lucene	0.612	0.647	0.629
BANNER + cosine similarity	0.649	0.674	0.661
DNorm (BANNER + pLTR)	0.803	0.763	0.782

Table 1. Evaluation of DNorm against several baseline techniques, using micro-averaged precision, recall and F-measure.

Downloads

DNorm Software
NCBI Disease Corpus
DNorm-tagged PubMed results in PubTator
DNorm RESTful API

Please cite

Robert Leaman, Rezarta Islamaj Dog <8C>an and Zhiyong Lu. DNorm: Disease Name Normalization with Pairwise Learning to Rank. Bioinformatics (2013) 29 (22): 2909-2917, doi:10.1093/bioinformatics/btt474
Robert Leaman, Ritu Khare and Zhiyong Lu. NCBI at 2013 ShARe/CLEF eHealth Share Task: Disorder Normalization in Clinical Notes with DNorm. Working Notes of the Conference and Labs of the Evaluation Forum (2013)
Robert Leaman and Zhiyong Lu. Automated Disease Normalization with Low Rank Approximations. Proceedings of BioNLP 2014: pp 24-28