SR4GN: a species recognition software tool for gene normalization
Authors: Chih-Hsuan Wei, Hung-Yu Kao and Zhiyong Lu (PI)
Research highlights
As suggested in recent studies, species recognition and disambiguation is one of the most critical and challenging steps in many downstream text-mining applications such as the gene normalization task and protein-protein interaction extraction. We report SR4GN: an open source tool for species recognition and disambiguation in biomedical text. In addition to the species detection function in existing tools, SR4GN is optimized for the Gene Normalization task. As such it is developed to link detected species with corresponding gene mentions in a document. SR4GN achieves 85.42% in accuracy and compares favorably to the other state-of-the-art techniques in benchmark experiments. Finally, SR4GN is implemented as a standalone software tool, thus making it convenient and robust for use in many text-mining applications.
Method overview
We show in Figure 1 an overview of our SR4GN system. Given as input an abstract or full-length article in either XML or free-text format, both sentence boundaries and gene mentions are first recognized in the preprocessing step. As shown in Figure 1, each sentence is assigned with a sentence identifier (SID). Then by default, we use AIIA-GMT for gene mention recognition but other tools may also be used. Next, SR4GN detects organism names from sentences and assigns them to pre-tagged gene names through the disambiguation step.

Results
Method | Accuracy |
SR4GN | 85.42% |
Wang et. al., 2010 | 83.80% |
Mu et. al., 2010 | 85.13% |
Species Detection Module | TAP-5 | TAP-10 | TAP-20 | F-measure |
SR4GN | 0.3278 | 0.3543 | 0.3543 | 0.4691 |
Linnaeus | 0.3042 | 0.3283 | 0.3283 | 0.4476 |
OrganismTagger | 0.2915 | 0.3011 | 0.3011 | 0.4456 |
Downloads
SR4GN-tagged PubMed results in PubTator Central
SR4GN
RESTful API
Please cite
- Wei C-H, Kao H-Y, Lu Z. SR4GN: a species recognition software tool for gene normalization. PLoS ONE, 7(6):e38460 doi:10.1371/journal.pone.0038460 (2012)