nih-gov/www.ncbi.nlm.nih.gov/bionlp/Tools/sr4gn

372 lines
No EOL
16 KiB
Text

<!DOCTYPE html>
<html lang="en" >
<head >
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<!-- Mobile properties -->
<meta name="HandheldFriendly" content="True">
<meta name="MobileOptimized" content="320">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<!-- Stylesheets -->
<link href="/research/bionlp/static/django_uswds/uswds/css/uswds.css" rel="stylesheet" />
<title>
SR4GN: a species recognition software tool for gene normalization
</title>
<link rel="stylesheet" href="/research/bionlp/static/main/css/uswds.css">
<link rel="stylesheet" href="/research/bionlp/static/main/css/header.css">
<link rel="stylesheet" href="/research/bionlp/static/main/css/footer.css">
<link rel="stylesheet" href="/research/bionlp/static/main/css/form.css">
<!-- Labs template -->
<link rel="stylesheet" href="/research/bionlp/static/main/css/atoms.css">
<link rel="stylesheet" href="/research/bionlp/static/main/css/docsum.css">
<link rel="stylesheet" href="/research/bionlp/static/main/css/media.css">
<!-- Additional template -->
<link rel="stylesheet" href="/research/bionlp/static/main/css/journals.molecules.css">
<link rel="stylesheet" href="/research/bionlp/static/main/css/custom.css">
<link rel="stylesheet" href="/research/bionlp/static/main/css/journals.journal-page.css">
<link rel="stylesheet" href="/research/bionlp/static/main/css/iconic-glyphs.css">
<link rel="stylesheet" href="/research/bionlp/static/main/css/featherlight.min.css">
<link rel="stylesheet" href="/research/bionlp/static/main/css/styles.css">
<!--[if lt IE 9]>
<link rel="stylesheet" href="/research/bionlp/static/main/css/iconic-glyphs-legacy.css">
<![endif]-->
<!-- Some JS -->
<script src="/research/bionlp/static/main/js/jquery.js"></script>
<script src="/research/bionlp/static/main/js/modernizr.js"></script>
<script src="/research/bionlp/static/main/js/featherlight.min.js"></script>
<script src="/research/bionlp/static/main/js/custom.js"></script>
</head>
<body >
<div>
<a class="skipnav" href="#maincontent">
Skip to main page content
</a>
<header class="ncbi-page-header" role="banner">
<div class="prefix">
<span class="nih" title="National Institutes of Health">
<a href="https://www.nih.gov/" title="To NIH homepage">
<img src="/research/bionlp/static/base/images/nih-logo-header.svg" alt="NIH">
</a>
</span>
<span class="nlm">
<a href="https://www.nlm.nih.gov/" title="To NLM homepage">U.S. National Library of Medicine</a>
</span>
</div>
<div class="ncbi">
<!-- <abbr class="abbr">
<a href="https://www.ncbi.nlm.nih.gov/" title="To NCBI homepage">NCBI</a>
</abbr>
<span class="name">
<a href="https://www.ncbi.nlm.nih.gov/" accesskey="1" title="To NCBI homepage">National Center for Biotechnology Information</a>
</span> -->
<!-- <abbr class="abbr">
<a href="https://www.nlm.nih.gov/research/index.html" title="To DIR homepage">DIR</a>
</abbr> -->
<span class="name">
<a href="https://www.nlm.nih.gov/research/index.html" accesskey="1" title="To DIR homepage">Division of Intramural Research</a>
</span>
<div class="right">
<a id="in" href="/research/bionlp/accounts/login/?next=/research/bionlp/">Log in</a>
</div>
</div>
</header>
<!--app-specific header, something that might want to take full width of screen -->
<a class="skipnav" href="#maincontent">
Skip to main page content
</a>
<div class="breadcrumbs-container menu">
<div class="usa-grid-full">
<ul class="topnav" accesskey="4">
<li class="current">
<a href="/research/bionlp/" title="Home">
Home
</a>
</li>
<li class="separator"></li>
<li>
<a href="/research/bionlp/Zhiyong-Lu" title="Zhiyong Lu">
Zhiyong Lu
</a>
</li>
<li class="separator"></li>
<li>
<a href="/research/bionlp/News" title="Media">
Media
</a>
</li>
<li class="separator"></li>
<li>
<a href="/research/bionlp/Team" title="Team">
Team
</a>
</li>
<li class="separator"></li>
<li>
<a href="/research/bionlp/Research" title="Research">
Research
</a>
</li>
<li class="separator"></li>
<li>
<a href="/research/bionlp/Publications/" title="Publications">
Publications
</a>
</li>
<li class="separator"></li>
<li>
<a href="/research/bionlp/Tools/" title="Tools">
Tools
</a>
</li>
<li>
<a href="/research/bionlp/APIs/" title="Tools">
Web APIs
</a>
</li>
<li class="separator"></li>
<li>
<a href="/research/bionlp/Data/" title="Data">
AI Datasets
</a>
</li>
<li>
<a href="/research/bionlp/Visiting-us" title="Visiting us">
Visiting us
</a>
</li>
<li class="icon">
<a href="#">&#9776;</a>
</li>
</ul>
</div>
</div>
<!-- asign css class in case app will need to alter styles of this div -->
<div id="maincontent" class="usa-grid-full ncbi-base-page-container">
<div class="labs-pagecontent">
<div class="usa-width-one-whole">
<main class="usa-grid journals-lists">
<h3>SR4GN: a species recognition software tool for gene normalization</h3>
<main class="usa-width-one-whole journal-container">
<div>
<div class="issue labs-docsums labs-content-box wrappall">
<h4>Authors: <a href="https://sites.google.com/site/chihhsuanwei/" target="_blank">Chih-Hsuan Wei</a>, <a
href="http://myweb.ncku.edu.tw/~hykao/" target="_blank">Hung-Yu Kao</a> and <a
href="/bionlp/" target="_blank">Zhiyong Lu</a> (PI)</h4>
<h4>Research highlights</h4>
<div class="usa-width-one-whole">
<p>
As suggested in recent studies, species recognition and disambiguation is one of the most critical
and challenging steps in many downstream text-mining applications such as the gene normalization
task and protein-protein interaction extraction. We report SR4GN: an open source tool for species
recognition and disambiguation in biomedical text. In addition to the species detection function in
existing tools, <b>SR4GN is optimized for the Gene Normalization task</b>. As such it is developed
to link detected species with corresponding gene mentions in a document. SR4GN achieves 85.42% in
accuracy and compares favorably to the other state-of-the-art techniques in benchmark experiments.
Finally, SR4GN is implemented as a standalone software tool, thus making it convenient and robust
for use in many text-mining applications.
</p>
</div>
</div>
<div class="issue labs-docsums labs-content-box wrappall">
<h4>Method overview</h4>
<div class="usa-width-one-whole">
<p>
We show in Figure 1 an overview of our SR4GN system. Given as input an abstract or full-length
article in either XML or free-text format, both sentence boundaries and gene mentions are first
recognized in the preprocessing step. As shown in Figure 1, each sentence is assigned with a
sentence identifier (SID). Then by default, we use <a href="http://bcsp1.iis.sinica.edu.tw/aiiagmt/"
target="_blank">AIIA-GMT</a> for gene mention
recognition but other tools may also be used. Next, SR4GN detects organism names from sentences and
assigns them to pre-tagged gene names through the disambiguation step.
</p>
<div class="figure">
<img src="/research/bionlp/static/main/images/tools/SR4GN.png"/>
<span><b>Figure 1.</b> An overview of the SR4GN workflow.</span>
</div>
</div>
</div>
<div class="issue labs-docsums labs-content-box wrappall">
<h4>Results</h4>
<div class="usa-width-one-whole">
<table class="customtable">
<tbody>
<tr>
<td>Method</td>
<td valign="top">Accuracy</td>
</tr>
<tr>
<td class="best">SR4GN</td>
<td class="best">85.42%</td>
</tr>
<tr>
<td valign="top">Wang et. al., 2010</td>
<td valign="top">83.80%</td>
</tr>
<tr>
<td valign="top">Mu et. al., 2010</td>
<td valign="top">85.13%</td>
</tr>
</tbody>
</table>
<span><b>Table 1.</b> Evaluation on species assignment using the <a
href="http://www.nactem.ac.uk/deca_details/start.cgi" target="_blank">DECA corpus</a>.</span>
<table class="customtable">
<tbody>
<tr>
<td><strong>Species Detection Module</strong></td>
<td valign="top"><strong>TAP-5</strong></td>
<td valign="top"><strong>TAP-10</strong></td>
<td valign="top"><strong>TAP-20</strong></td>
<td valign="top"><strong>F-measure</strong></td>
</tr>
<tr>
<td class="best">SR4GN</td>
<td class="best">0.3278</td>
<td class="best">0.3543</td>
<td class="best">0.3543</td>
<td class="best">0.4691</td>
</tr>
<tr>
<td valign="top">Linnaeus</td>
<td valign="top">0.3042</td>
<td valign="top">0.3283</td>
<td valign="top">0.3283</td>
<td valign="top">0.4476</td>
</tr>
<tr>
<td valign="top">OrganismTagger</td>
<td valign="top">0.2915</td>
<td valign="top">0.3011</td>
<td valign="top">0.3011</td>
<td valign="top">0.4456</td>
</tr>
</tbody>
</table>
<span><b>Table 2.</b> Evaluation using the test data from the <a
href="http://www.biocreative.org/resources/corpora/biocreative-iii-corpus/" target="_blank">BioCreative III GN task</a>.</span>
</div>
</div>
<div class="issue labs-docsums labs-content-box wrappall">
<h4>Downloads</h4>
<div class="usa-width-one-whole">
<p>
SR4GN-tagged PubMed results in <a href="https://www.ncbi.nlm.nih.gov/research/pubtator/"
target="_blank">PubTator Central</a><br/>
<a href="/research/bionlp/APIs/">SR4GN
RESTful API</a>
</p>
</div>
</div>
<div class="issue labs-docsums labs-content-box wrappall">
<h4>Please cite</h4>
<div class="usa-width-one-whole">
<ul class="dot-list">
<li>Wei C-H, Kao H-Y, Lu Z. <a
href="http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0038460"
target="_blank">SR4GN: a species recognition software tool for gene normalization</a>. PLoS
ONE, 7(6):e38460 doi:10.1371/journal.pone.0038460 (2012)
</li>
</ul>
</div>
</div>
</div>
</main>
</main>
</div>
</div>
</div>
<footer class="usa-footer usa-footer-big ncbi-footer" role="contentinfo">
<div class="usa-grid">
<div class="usa-row">
<div class="usa-width-one-half">
<div>
<div class="org-section">
<a href="https://www.hhs.gov/"><img class="usa-footer-logo-img hhs-logo"
src="/research/bionlp/static/base/images/dhhs-logo-white.svg"
alt="U.S. Department of Health & Human Services">
<span class="usa-sr-only">Department of Health and Human Services</span></a>
<a href="https://www.nih.gov/"><img class="usa-footer-logo-img nih-logo"
src="/research/bionlp/static/base/images/nih-logo-white.svg"
alt="National Institutes of Health">
<span class="usa-sr-only">National Institutes of Health</span></a>
<a href="https://www.nlm.nih.gov/"><img class="usa-footer-logo-img nlm-logo"
src="/research/bionlp/static/base/images/nlm-logo-letters-white.svg"
alt="National Library of Medicine">
<span class="usa-sr-only">National Library of Medicine</span></a>
<a href="https://www.usa.gov/"><img class="usa-footer-logo-img usagov-logo"
src="/research/bionlp/static/base/images/usagov-logo-white.svg"
alt="USA.gov"/>
<span class="usa-sr-only">USA.gov</span></a>
</div>
</div>
</div>
<div class="usa-width-one-half">
<div>
<p class="about-links">
<a href="https://www.nlm.nih.gov/research/index.html">About DIR</a>
<a href="https://www.nlm.nih.gov/web_policies.html">Web Policies</a></p>
</div>
</div>
</div>
</div>
</footer>
</div>
<!-- JavaScript -->
<script src="/research/bionlp/static/django_uswds/uswds/js/uswds.js"></script>
<script type="text/javascript" src="/research/bionlp/static/base/header.js"></script>
</body>
</html>