nih-gov/www.nlm.nih.gov/pubs/techbull/so98/so98_y2k.html
2025-02-26 13:17:41 -05:00

522 lines
14 KiB
HTML

<!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN">
<html lang="en">
<head>
<title>The Year 2000 Solution for ELHILL and the MEDLARS Databases. Sep-Oct 1998. NLM Technical Bulletin </title>
<meta name="DC.Subject.IssueCover" content="/pubs/techbull/so98/so98_technote.html" />
<meta name="DC.Subject.IssueNum" content="304" />
<!--do not remove!! nlm survey script - contact wwwnlm@nlm.nih.gov with questions -->
<script src="/share/scripts/survey.js" type="text/javascript" language="javascript"></script>
<meta name="DC.Subject.Keyword" content="AIDSLINE" />
<meta name="DC.Subject.Keyword" content="Automatic SDI" />
<meta name="DC.Subject.Keyword" content="CATLINE" />
<meta name="DC.Subject.Keyword" content="Date Created" />
<meta name="DC.Subject.Keyword" content="Date of Publication" />
<meta name="DC.Subject.Keyword" content="Editor's Note" />
<meta name="DC.Subject.Keyword" content="ELHILL" />
<meta name="DC.Subject.Keyword" content="Gratefully Yours" />
<meta name="DC.Subject.Keyword" content="Health Services and Technology Assessment Research Database" />
<meta name="DC.Subject.Keyword" content="Indexing" />
<meta name="DC.Subject.Keyword" content="Medical Literature Analysis and Retrieval System" />
<meta name="DC.Subject.Keyword" content="Medical Subject Headings" />
<meta name="DC.Subject.Keyword" content="MEDLINE" />
<meta name="DC.Subject.Keyword" content="Subheading" />
<meta name="DC.Subject.Keyword" content="Year-End Processing" />
</head>
<body bgcolor="#f0f0f0" link="#083184" vlink="#960044"><noscript><iframe src="//www.googletagmanager.com/ns.html?id=GTM-MT6MLL" height="0" width="0" style="display:none;visibility:hidden" title="googletagmanager"></iframe></noscript><script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start': new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='//www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-MT6MLL');</script>
<style type="text/css">
#skip, .skip, .skipnavigation {
position:absolute;
left:0px;
top:-500px;
width:1px;
height:1px;
overflow:hidden;
}
</style>
<div class="skipnavigation"><a title="Skip the navigation on this page" href="#skipnav" class="skipnavigation">Skip Navigation Bar</a></div>
<a href="/nlmhome.html"><img src="/pubs/techbull/tb_graphics/tbhead4.gif" border=0 alt="NLM Technical Bulletin"/></a>
<h3><em>NLM Technical Bulletin</em> 1998 Sep-Oct; 304</h3>
<hr />
<a id="skipnav" name="skipnav"></a>
<table border=0 cellspacing=0 cellpadding=10>
<tr>
<td valign=top bgcolor="#CB9966" width=140>
<hr />
<center><strong>In This Issue:</strong></center>
<hr />
<p>
<a href="/pubs/techbull/so98/so98_technote.html">Technical Notes</a> - e1
</p>
<p>
<a href="/pubs/techbull/so98/so98_yep.html">Year-End Processing</a> - e2
</p>
<p>
<a href="/pubs/techbull/so98/so98_sdi.html">YEP of Stored and Saved Searches, Changes to Automatic SDI's and Saved Searches - ELHILL MEDLARS System</a> - e3
</p>
<p>
<a href="/pubs/techbull/so98/so98_gy.html">Farewell to <em>Gratefully Yours</em></a> - e4
</p>
<p>
<img src="/pubs/techbull/tb_graphics/blutri.gif" alt="dot" height=10 width=10/>The Year-2000 Solution for ELHILL and the MEDLARS Databases
</p>
<p>
<a href="/pubs/techbull/so98/so98_mesh.html">MeSH Coming Attractions</a> - e6
</p>
<p>
<a href="/pubs/techbull/so98/so98_avcat.html">AVLINE&#174; and CATLINE&#174; Data to be Removed from Other NLM Databases [corrected 1999/02/09]</a> - e7
</p>
<p>
<a href="/pubs/techbull/so98/so98_meshweb.html"> MeSH on the Web</a> - e8
</p>
<hr />
<center><strong>Appendixes:</strong></center>
<hr />
<p>
<a href="/pubs/techbull/so98/so98_meshpreexp.html">List of MeSH Heading Pre-explosions</a>
</p>
<p>
<a href="/pubs/techbull/so98/so98_subheadpreexp.html">List of Subheading Pre-explosions</a>
</p>
<p>
<a href="so98_medupdate.html">MEDLINE - 1999 Weekly Update Schedule on ELHILL</a>
</p>
<p>
<a href="so98_aidsupdate.html">AIDSLINE - 1999 Weekly Update Schedule on ELHILL</a>
</p>
<p>
<a href="so98_healthstarupdate.html">HealthSTAR - 1999 Weekly Update Schedule on ELHILL</a>
</p>
<p>
<a href="so98_monthlyupdate.html">NLM Databases - 1999 Monthly Update Schedule on ELHILL</a>
</p>
<p>
1999 NLM Pricing Algorithm Chart - [This link was removed because it is no longer valid.]
</p>
<p>
<a href="so98_pricesched.html">MEDLARS Pricing Schedule</a>
</p>
<hr />
<center><strong><a href="/pubs/techbull/current_issue.html">Current Issue</a></strong></center>
<hr />
<center><strong><a href="/pubs/techbull/tb.html">Home</a></strong></center>
<hr />
<center><strong><a href="/pubs/techbull/back_issues.html">Back Issues</a></strong></center>
<hr />
<center><strong><a href="/pubs/techbull/new_index.html">Index</a></strong></center>
<hr />
</td>
<td valign=top width=440>
<h3>The Year 2000 Solution for ELHILL&#174; and the MEDLARS&#174; Databases</h3>
<p>
[Editor's Note: This article is a technical presentation of the implementation of Year 2000 compliancy for NLM's ELHILL databases. Please see the <a href="http://www.nlm.nih.gov/pubs/techbull/so97/so97_yep.html"><em>Year-End Processing</em></a> article in the September-October 1997 <em>NLM Technical Bulletin</em> for search hints.]
</p>
<p>
In the Spring of 1997, the NLM Information Retrieval System (IRS) ELHILL was providing access to approximately 35 databases of 20 million citations and about 40 gigabytes of disk storage. These data came from a variety of sources, both internal and external to the NLM, and were processed through standard MEDLARS programs and individualized conversion programs.
</p>
<p>
The Office of Computer and Communications Systems (OCCS) was tasked to make all the computer systems, both hardware and software, Year 2000 compliant, as mandated by law. With the current retrieval system expected to be replaced in another 1 1/2 to 2 years, it became necessary to find a solution that would not take that much time to implement.
</p>
<p>
There are basically four types of date fields in MEDLARS, as follows:
</p>
<dl compact>
<dt> a)</dt>
<dd>two digit fields consisting of just the last two characters of the year,
e.g., Year (YY) --- '98'
</dd>
<dt>b)</dt>
<dd>four digit fields consisting of the last two characters of the year and the relative month,
e.g., Entry Month (YYMM) --- '9805' --- May 1998
</dd>
<dt>c)</dt>
<dd>six digit fields consisting of the last two characters of the year, the relative month, and the day,
e.g., Date of Entry (YYMMDD) --- '980529' --- May 29, 1998
</dd>
<dt>d)</dt>
<dd>four digit (or more) fields beginning with the full representation of the year, including the Century (CC),
e.g., Date of Publication (CCYY......) --- '1998', '1998 May 29,' or, 1998 Spring', etc.
</dd>
</dl>
<p>
Only the first three needed to be adjusted for both retrieval and display to the user; the fourth was already Year-2000 compliant. One additional factor had to be addressed: Ranging. ELHILL allows ranging in the form of 'less than x', 'greater than y', and 'from x to y', where 'x' and 'y' represent whole numbers. The overwhelming majority of ranging in the ELHILL IRS is on dates of the forms (a), (b), and (c), as shown above. Clearly, a ranging operation using 'from 99 to 01' would be illegal as the upper bound is less than the lower bound. Therefore, in addition to direct searching and display, numeric ranging would have to be addressed.
</p>
<p>
The main aim of the solution was to avoid changing the data in the citation, but give the user the appearance of having changed the data. Since almost all of the data in the MEDLARS databases was published starting in the 1960's, MEDLARS therefore offers a special case which might not be available to other systems. With the exception of a special presentation rule for display (printing) in the ELHILL IRS, all the necessary changes could be made in the File Generation and Maintenance (FGM) job stream(s) which build and maintain the databases.
</p>
<p>
The FGM Subsystem of MEDLARS is composed of a series of programs, sorts, and merges, which process new and maintained citations and:
</p>
<dl compact>
<dt> a)</dt>
<dd>build and maintain the citation itself, creating two sequential files of intermediate index points and ranging points,</dd>
<dt>b)</dt>
<dd>add enrichment data such as Medical Subject Heading (Trees), Pre-Explosions, etc., to the intermediate indexing points,</dd>
<dt>c)</dt>
<dd>merge the ranging values and (now-enriched) index points with the existing indexes and ranging file(s) to complete the building process.</dd>
</dl>
<p>
The solution consists of a single sort and a new program to run between steps (a) and (b) immediately above, as follows:
</p>
<dl compact>
<dt>a)</dt>
<dd>the sort, driven by specification, would create a duplicate file of index points for those fields to be enriched with the Century (CC) representation. Note that the original fields without the Century representation are untouched, still allowing the user to search without the Century representation. As the index points, whether numeric or not, are represented by characters, not binary or decimal, the first character is examined and if found to be a '0' has a '20' appended before the original data; if not a '0', a '19' is appended. This change, good until 2009 (well after the current system is to be replaced), allows the user to search '1998' as both '98' and '1998', and allows '01' to be searched as both '01' and '2001'. These updated index points are merged back with the intermediate index points and no other changes are necessary.</dd>
<dt>b)</dt>
<dd>the same program adjusts the ranging values, again driven by specification. Ranging presented a more difficult problem as the numeric data are represented as four-character binary fields and not characters. It was therefore necessary to know whether the intermediate ranging data to be adjusted originated from a two-, four-, or six-character field. This was supplied in the specifications. The binary data in the field to be adjusted was compared to 10**(n-1) where 'n' was the number of digits in the original field (2, 4, or 6). If the value was less than 10**(n-1), then 20*10**n was added to the field; if greater or equal, than 19*10**n was added, e.g., let us assume we are adjusting a two-character Year field containing the binary representation of '98'. '98' is greater than 10**(2-1) or 10, and 19*10**2 (1900) is added to '98', making it '1998'. If the value were '01', it would be found to be less than 10**(2-1) and 20*10**2 (2000) would be added to '01', making it '2001'. The same holds true for fields originally composed of 4-, or 6-digit fields. These now updated ranging points replace the original file or ranging points and no other changes are necessary.</dd>
<dt>c)</dt>
<dd>a print rule was written for ELHILL (there are over 30 of these presentation rules) which adjusts the field using the same logic as described in (a) above; a '19' or a '20' is appended at the beginning of specified fields for presentation depending on the first character.</dd>
</dl>
<p>
The requisite programming was written and tested in the Spring of 1997. As the NLM replaces just about all of its databases starting in the Summer in a process known as Year-End Processing (YEP), it was decided to implement the change during the rebuilding process. All updated and now Year 2000 compliant databases were replaced in mid-December 1997, along with the IRS presentation changes, and the system has been running successfully since that time without error.
</p>
<p>
As the data distributed to our tape recipients were unchanged, the above described algorithm was made available to them in the Fall of 1997 before the data were distributed in late December.
</p>
<dl>
<dd><em>--prepared by David Kenton, Database Administrator</em></dd>
<dd>Office of Computer and Communications Systems</dd>
</dl>
<br />
</tr>
<tr><td valign=top bgcolor="#CB9966" width=140>
<a href="/pubs/techbull/so98/so98_mesh.html">Next article</a>
<p>
<a href="/pubs/techbull/so98/so98_y2k.html">Table of contents</a>
</p>
<p>
<a href="/pubs/techbull/tb.html">Home</a>
</p>
<p>
<a href="/pubs/techbull/new_index.html">Index</a>
</p>
</td>
</tr>
</table>
<br />
<hr />
<!-- BEGIN NLM FOOTER -->
<center>
<font size="2" face="helvetica, arial"><a href="/nlmhome.html">U.S. National Library of Medicine</a>, 8600 Rockville Pike, Bethesda, MD 20894<br /><a href="http://www.nih.gov/">National Institutes of Health</a>, <a href="//www.hhs.gov/">Department of Health &amp; Human Services</a><br /><a href="/copyright.html">Copyright</a>, <a href="/privacy.html">Privacy</a>, <a href="/accessibility.html">Accessibility</a>, <a href="http://www.nih.gov/icd/od/foia/index.htm">Freedom of Information Act (FOIA)</a><br/><a href="https://www.hhs.gov/vulnerability-disclosure-policy/index.html">HHS Vulnerability Disclosure</a>
<br />
<!-- ******************MODIFY "LAST UPDATED" ******************* -->
Last updated: 13 February 2004
</font>
</center>
<!-- END NLM FOOTER -->
<!-- ******************MODIFY EXPDATE AND EMAIL BELOW****************** -->
<!-- EXPDATE="2015-03-20" -->
<!-- EMAIL="nlmtechbull@mail.nlm.nih.gov" -->
<script src="//assets.nlm.nih.gov/jquery/jquery-latest.min.js"></script><script src="/core/nlm-notifyExternal/1.0/nlm-notifyExternal.min.js"></script></body>
</html>