Skip to main page content Skip to main page content

SR4GN: a species recognition software tool for gene normalization

Authors: Chih-Hsuan Wei, Hung-Yu Kao and Zhiyong Lu (PI)

Research highlights

As suggested in recent studies, species recognition and disambiguation is one of the most critical and challenging steps in many downstream text-mining applications such as the gene normalization task and protein-protein interaction extraction. We report SR4GN: an open source tool for species recognition and disambiguation in biomedical text. In addition to the species detection function in existing tools, SR4GN is optimized for the Gene Normalization task. As such it is developed to link detected species with corresponding gene mentions in a document. SR4GN achieves 85.42% in accuracy and compares favorably to the other state-of-the-art techniques in benchmark experiments. Finally, SR4GN is implemented as a standalone software tool, thus making it convenient and robust for use in many text-mining applications.

Method overview

We show in Figure 1 an overview of our SR4GN system. Given as input an abstract or full-length article in either XML or free-text format, both sentence boundaries and gene mentions are first recognized in the preprocessing step. As shown in Figure 1, each sentence is assigned with a sentence identifier (SID). Then by default, we use AIIA-GMT for gene mention recognition but other tools may also be used. Next, SR4GN detects organism names from sentences and assigns them to pre-tagged gene names through the disambiguation step.

Figure 1. An overview of the SR4GN workflow.

Results

Method Accuracy
SR4GN 85.42%
Wang et. al., 2010 83.80%
Mu et. al., 2010 85.13%
Table 1. Evaluation on species assignment using the DECA corpus.
Species Detection Module TAP-5 TAP-10 TAP-20 F-measure
SR4GN 0.3278 0.3543 0.3543 0.4691
Linnaeus 0.3042 0.3283 0.3283 0.4476
OrganismTagger 0.2915 0.3011 0.3011 0.4456
Table 2. Evaluation using the test data from the BioCreative III GN task.

Downloads

SR4GN-tagged PubMed results in PubTator Central
SR4GN RESTful API

Please cite