nih-gov/www.ncbi.nlm.nih.gov/mailman/pipermail/utilities-announce/2022-February/000154.html

87 lines
5.8 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML>
<HEAD>
<TITLE> [Utilities-announce] MEDLINE / PubMed Transition to Automated MeSH Indexing
</TITLE>
<LINK REL="Index" HREF="index.html" >
<LINK REL="made" HREF="mailto:utilities-announce%40ncbi.nlm.nih.gov?Subject=Re%3A%20%5BUtilities-announce%5D%20MEDLINE%20/%20PubMed%20Transition%20to%20Automated%20MeSH%0A%20Indexing&In-Reply-To=%3Cmailman.2199.1643725730.54485.utilities-announce%40ncbi.nlm.nih.gov%3E">
<META NAME="robots" CONTENT="index,nofollow">
<style type="text/css">
pre {
white-space: pre-wrap; /* css-2.1, curent FF, Opera, Safari */
}
</style>
<META http-equiv="Content-Type" content="text/html; charset=us-ascii">
<LINK REL="Next" HREF="000155.html">
</HEAD>
<BODY BGCOLOR="#ffffff">
<H1>[Utilities-announce] MEDLINE / PubMed Transition to Automated MeSH Indexing</H1>
<B>utilities-announce at ncbi.nlm.nih.gov</B>
<A HREF="mailto:utilities-announce%40ncbi.nlm.nih.gov?Subject=Re%3A%20%5BUtilities-announce%5D%20MEDLINE%20/%20PubMed%20Transition%20to%20Automated%20MeSH%0A%20Indexing&In-Reply-To=%3Cmailman.2199.1643725730.54485.utilities-announce%40ncbi.nlm.nih.gov%3E"
TITLE="[Utilities-announce] MEDLINE / PubMed Transition to Automated MeSH Indexing">utilities-announce at ncbi.nlm.nih.gov
</A><BR>
<I>Tue Feb 1 09:28:41 EST 2022</I>
<P><UL>
<LI>Next message (by thread): <A HREF="000155.html">[Utilities-announce] Subject: Virtual Codeathon Opportunity. Apply today!
</A></li>
<LI> <B>Messages sorted by:</B>
<a href="date.html#154">[ date ]</a>
<a href="thread.html#154">[ thread ]</a>
<a href="subject.html#154">[ subject ]</a>
<a href="author.html#154">[ author ]</a>
</LI>
</UL>
<HR>
<!--beginarticle-->
<PRE>Update on MEDLINE: Transition to Automated MeSH Indexing
As part of NLM's efforts to transform and accelerate biomedical discovery and improve health and health care, we are transitioning to automated MeSH indexing of MEDLINE citations. Automated indexing will provide users with timely access to MeSH indexing metadata and allow NLM to scale MeSH indexing for MEDLINE to the volume of published biomedical literature. Human indexers have been and will continue to be involved in the refinement of automated indexing and will play a significant role in the quality assurance approaches for automated indexing.
Automated MeSH indexing has been under development at NLM for many years as part of the Indexing Initiative, the most significant outcome of which was the development of the Medical Text Indexer (MTI). In recent years, as we have moved towards automation, the MTI algorithm has undergone significant refinements, including incorporation of deep learning approaches to improve the application of MeSH subheadings, the incorporation of rules and triggers for the indexing of Publication Types, and the application of major topic designation. The version of MTI used for automated indexing is termed MTIA. The method of indexing (automated, curated, fully human indexed) is identifiable in the XML of completed citations.
MTIA is currently being applied to citations from a variety of journals - since the fall of 2019, more than 600,000 citations have been completed by MTIA. Human curation of MTIA-indexed citations originally involved a scan of all citations indexed by MTIA but has been modified to focus on specific sets of citations (e.g., those involving genes and proteins) and a random sampling of other citations to scale curation. This effort has allowed us to iteratively improve the MTIA algorithm, expand its application to an increasing number of citations, and decrease the time required for application of MeSH indexing to a given citation. Currently MTIA indexing is based on processing of the title and abstract of an article; future work will investigate automated indexing that is based on processing of the article's full text.
The focus of MTIA algorithmic improvements to date has been on precision - to ensure that indexed terms are correct and irrelevant terms are not indexed. This focus is expected to shift in the future as we work to improve recall to ensure that important concepts are not missed. We are specifically working to improve the MeSH indexing of chemicals through the development of the NLM-Chem identification tool. We also plan to transition to a version of MTIA that incorporates expanded deep learning approaches.
By mid-2022 we expect that all citations indexed for MEDLINE will be indexed by MTIA, with human curation applied as indicated. Beyond achievement of this milestone, the MTIA algorithm will continue to be refined and improved as noted above, and significant resources are expected to be devoted to this initiative.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: &lt;<A HREF="http://www.ncbi.nlm.nih.gov/mailman/pipermail/utilities-announce/attachments/20220201/e92cf813/attachment.html">http://www.ncbi.nlm.nih.gov/mailman/pipermail/utilities-announce/attachments/20220201/e92cf813/attachment.html</A>&gt;
</PRE>
<!--endarticle-->
<HR>
<P><UL>
<!--threads-->
<LI>Next message (by thread): <A HREF="000155.html">[Utilities-announce] Subject: Virtual Codeathon Opportunity. Apply today!
</A></li>
<LI> <B>Messages sorted by:</B>
<a href="date.html#154">[ date ]</a>
<a href="thread.html#154">[ thread ]</a>
<a href="subject.html#154">[ subject ]</a>
<a href="author.html#154">[ author ]</a>
</LI>
</UL>
<hr>
<a href="http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce">More information about the Utilities-announce
mailing list</a><br>
</body></html>