Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Jun;75(6):496-500.
doi: 10.1016/j.ijmedinf.2005.06.011. Epub 2005 Aug 8.

Distributed modules for text annotation and IE applied to the biomedical domain

Affiliations

Distributed modules for text annotation and IE applied to the biomedical domain

Harald Kirsch et al. Int J Med Inform. 2006 Jun.

Abstract

Biological databases contain facts from scientific literature that have been curated by hand to ensure high quality. Curation is time-consuming and can be supported by information extraction methods. We present a server software infrastructure which allows to easily plug in modules to identify biologically interesting pieces of text to be then presented in a web interface to the curator. There are modules which identify UniProt, UMLS and GO terminology, gene and protein names, mutations and protein-protein interactions. UniProt, UMLS and GO concepts are automatically linked to the original source. The module for mutations is based on syntax patterns and the one for protein-protein interactions relies on chunk parsing. All modules work as separate servers possibly distributed on different machines and can be combined into processing pipelines as necessary. Communication is based on XML annotated text streams, each server processing the XML elements it is designed for, and possibly adding more information in the form of XML annotation. The server and the underlying software are available to the public.

PubMed Disclaimer

Similar articles

Cited by

LinkOut - more resources