213 lines
12 KiB
HTML
213 lines
12 KiB
HTML
<html>
|
|
<head>
|
|
<title>Sequence Identifiers: GI number and Accession.Version</title>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
|
|
<meta name="author" content="NCBI_user_services">
|
|
<META NAME="keywords" CONTENT="sequence identifiers, UID, GI number, version, accession.version, track changes to sequences, revision history, glossary, definitions, dictionary, guide, education, teach, teacher">
|
|
<META NAME="description" CONTENT="Description of the two types of sequence identifiers used in sequence records, GI number and accession.version, and their use in tracking changes to sequences. Also summarizes the differences between the two types of identifiers, and provides a historical note about their implementation.">
|
|
<link rel="stylesheet" href="/corehtml/ncbi.css">
|
|
</head>
|
|
|
|
|
|
<body bgcolor="#FFFFFF" background="/corehtml/bkgd.gif" text="#000000" link="#CC6600" vlink="#CC6600">
|
|
|
|
|
|
<!-- ======================PAGE HEADER==================== -->
|
|
|
|
<table border="0" width="600" cellspacing="0" cellpadding="0">
|
|
<tr>
|
|
<td width="140"><a href="http://www.ncbi.nlm.nih.gov/"> <img src="http://www.ncbi.nlm.nih.gov/corehtml/left.GIF" width="130" height="45" border="0"></a></td>
|
|
|
|
<!-- ========TITLE OF PAGE (WITH SITEMAP LOGO GIF)========= -->
|
|
<td width="460" class="head1" valign="BOTTOM"> <span class="H1">Sequence Identifiers: A Historical Note</span></td>
|
|
<!-- td width="100" valign="center"><a href="ResourceGuide.html"><img SRC="sitemaplogo.gif" height="45" width="100" border="0"></a></td -->
|
|
<!-- ============END TITLE OF PAGE ================== -->
|
|
|
|
</tr>
|
|
</table>
|
|
<!-- ========================END PAGE HEADER==================== -->
|
|
|
|
|
|
<!-- ===============QUICKLINKS BAR WITHOUT SEARCH BOX============ -->
|
|
<table CLASS="TEXT" border="0" width="100%" cellspacing="0" cellpadding="3" bgcolor="#003366">
|
|
<tr CLASS="TEXT" align="CENTER">
|
|
<td width="16%"><a href="/entrez/" class="BAR">PubMed</a></td>
|
|
<td width="16%"><a href="/Entrez/" class="BAR">Entrez</a></td>
|
|
<td width="16%"><a href="/BLAST/" class="BAR">BLAST</a></td>
|
|
<td width="16%"><a href="/entrez/query.fcgi?db=OMIM" class="BAR">OMIM</a></td>
|
|
<td width="16%"><a href="/Taxonomy/taxonomyhome.html" class="BAR">Taxonomy</a></td>
|
|
<td width="16%"><a href="/Structure/" class="BAR">Structure</a></td>
|
|
</tr>
|
|
</table>
|
|
<!-- ============END QUICKLINKS BAR WITHOUT SEARCH BOX============ -->
|
|
|
|
<!-- ===========================CONTENTS=========================== -->
|
|
|
|
<table border="0" width="600" cellspacing="0" cellpadding="0">
|
|
<tr valign="TOP">
|
|
|
|
<!-- ===========================BLUE_SIDE_BAR======================= -->
|
|
<!-- left column -->
|
|
<td width="125">
|
|
<img src="http://www.ncbi.nlm.nih.gov/corehtml/spacer10.GIF" width="125" height="1" border="0">
|
|
|
|
<SPAN class="GUTTER1"><a href="http://www.ncbi.nlm.nih.gov/" class="GUTTER1">NCBI Home</a><br><br>
|
|
|
|
<SPAN class="GUTTER1"><a href="index.html" class="GUTTER1">Site Map</a><br>
|
|
<SPAN class="GUTTER2">
|
|
<a href="ResourceGuide.html" class="GUTTER2">Resource Guide</a><br>
|
|
<a href="AlphaList.html" class="GUTTER2">Alphabetical List</a><br><br>
|
|
|
|
<SPAN class="GUTTER1"><a href="ResourceGuide.html#AboutNCBI" class="GUTTER1">About NCBI<br>
|
|
<SPAN class="GUTTER2">general and contact information</a><br><br>
|
|
|
|
<SPAN class="GUTTER1"><a href="ResourceGuide.html#GenBank" class="GUTTER1">GenBank<br>
|
|
<SPAN class="GUTTER2">submit your sequence, general information</a><br><br>
|
|
|
|
<SPAN class="GUTTER1"><a href="ResourceGuide.html#Databases" class="GUTTER1">Molecular Databases<br>
|
|
<SPAN class="GUTTER2">nucleotides, proteins, structures and taxonomy</a><br><br>
|
|
|
|
|
|
<SPAN class="GUTTER1"><a href="ResourceGuide.html#Literature" class="GUTTER1">Literature Databases<br>
|
|
<SPAN class="GUTTER2">PubMed, PubRef, OMIM, Citation Matcher</a><br><br>
|
|
|
|
<SPAN class="GUTTER1"><a href="ResourceGuide.html#Genomes" class="GUTTER1">Genomes and Maps<br>
|
|
<SPAN class="GUTTER2">maps, the human genome and model organisms</a><br><br>
|
|
|
|
<SPAN class="GUTTER1"><a href="ResourceGuide.html#Tools" class="GUTTER1">Tools<br>
|
|
<SPAN class="GUTTER2">for data mining and analysis</a><br><br>
|
|
|
|
<SPAN class="GUTTER1"><a href="ResourceGuide.html#Research" class="GUTTER1">Research at NCBI<br>
|
|
<SPAN class="GUTTER2">people and projects</a><br><br>
|
|
|
|
<SPAN class="GUTTER1"><a href="ResourceGuide.html#SoftwareEngineering" class="GUTTER1">Software Engineering<br>
|
|
<SPAN class="GUTTER2">Tools, R&D and databases</a><br><br>
|
|
|
|
<SPAN class="GUTTER1"><a href="ResourceGuide.html#Education" class="GUTTER1">Education<br>
|
|
<SPAN class="GUTTER2">teaching resources and on-line tutorials</a><br><br>
|
|
|
|
<SPAN class="GUTTER1"><a href="ResourceGuide.html#FTPSite" class="GUTTER1">FTP site<br>
|
|
<SPAN class="GUTTER2">download data and software</a><br>
|
|
</td>
|
|
|
|
<!-- extra column to force things over the gif border -->
|
|
<td width="15"><img src="http://www.ncbi.nlm.nih.gov/corehtml/spacer10.GIF" width="15" height="1" border="0"> </td>
|
|
<!-- right content column -->
|
|
<td width="460">
|
|
<p> </p>
|
|
<!-- ======================END_BLUE_SIDE_BAR======================= -->
|
|
|
|
|
|
<!-- ================TABLE STYLE FOR SUMMARY PAGES================= -->
|
|
|
|
<!-- ----------------------Regular Body Text -------------------- -->
|
|
|
|
<table BORDER=0 CELLSPACING=0 CELLPADDING=0 WIDTH="460">
|
|
<tr>
|
|
<td CLASS="TEXT">
|
|
|
|
<p><b>Question</b>: </p>
|
|
|
|
<p>Why are there two types of sequence identification numbers (GI and VERSION), and what is the difference between them?</p>
|
|
|
|
<p><b>Answer</b>: </p>
|
|
|
|
<p>The two types of sequence identification numbers, <b>GI</b> and <b>VERSION</b>, have different formats and were implemented at different points in time.</p>
|
|
|
|
<ol>
|
|
<li><b>GI</b> number (sometimes written in lower case, "<b>gi</b>") is simply a series of digits that are assigned consecutively to each sequence record processed by NCBI. The GI number bears no resemblance to the Accession number of the sequence record.</li><br>
|
|
|
|
<ul>
|
|
<li>nucleotide sequence GI number is shown in the VERSION field of the database record</li><br>
|
|
|
|
<li>protein sequence GI number is shown in the CDS/db_xref field of a nucleotide database record, and the VERSION field of a protein database record</li>
|
|
</ul>
|
|
|
|
<br>
|
|
<li><b>VERSION</b> is made of the accession number of the database record followed by a dot and a version number (and is therefore sometimes referred to as the "<b>accession.version</b>") </li><br>
|
|
|
|
<ul>
|
|
<li>nucleotide sequence version contains two letters followed by six digits, a dot, and a version number (or for older nucleotide sequence records, the format is one letter followed by five digits, a dot, and a version number)</li><br>
|
|
|
|
<li>protein sequence version contains three letters followed by five digits, a dot, and a version number</li>
|
|
</ul>
|
|
|
|
</ol>
|
|
|
|
<p>The GI number has been used for many years by NCBI to track sequence histories in GenBank and the other sequence databases it maintains. The VERSION system of identifiers was adopted in February 1999 by the International Nucleotide Sequence Database Collaboration (GenBank, EMBL, and DDBJ). More details are given in the historical note, below.</p>
|
|
|
|
<p>The two systems of identifiers run in parallel to each other. That is, when any change is made to a sequence, it receives a new GI number AND an increase to its version number.</p>
|
|
|
|
<p>A <a href="http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi">Sequence Revision History</a> tool is available to track the various gi numbers, version numbers, and update dates for sequences that appeared in a specific GenBank record (<a href="sequencerevisionhistory.html">more information and example</a>).</p>
|
|
|
|
<hr><br>
|
|
|
|
<p><b>Historical Note</b>: </p>
|
|
|
|
<p>The first type of sequence identification number was GI, which stands for "GenInfo Identifier." GenInfo was an early system used to access GenBank and related databases. A GI number was assigned to each nucleotide and protein sequence accessible through the NCBI search systems, and was a means of tracking changes to the sequence. However, GI numbers were not used uniformly across the collaborating databases (GenBank, EMBL, DDBJ). They instead served as an internal tracking system for the databases that chose to implement them. In addition, the gi number for a nucleotide sequence originally appeared in the "Comment" field of a record. There was no separate field for sequence identification numbers.</p>
|
|
|
|
<p>When the collaborating databases began to formalize use of sequence identifiers, they created a new, separate field called <a href="samplerecord.html#NIDA">NID</a> (nucleotide identifier) in the database record, which contained the GI number of the nucleotide sequence. Similarly, the GI number for each protein sequence was named <a href="samplerecord.html#PIDA">PID</a>, and placed above each amino acid translation in the field: FEATURES/CDS/db_xref="PID:gNNNNNN". Hence, there became two types of gi numbers: NID and PID. In December 1999, the use of the abbreviations "NID" and "PID" was discontinued. Both are now just shown as "GI".</p>
|
|
|
|
<p>In February 1999, GenBank/EMBL/DDBJ implemented a new "<a href="samplerecord.html#VersionA">accession.version</a>" system of sequence identifiers that runs parallel to the gi number system. (See section 1.3.2 of the <a href="ftp://ncbi.nlm.nih.gov/genbank/release.notes/gb111.release.notes">GenBank 111.0 release notes</a> for details.)</p>
|
|
|
|
<p>Unlike the gi number system, in which sequence identification numbers were not necessarily consistent across the databases (e.g., GenBank and EMBL could each assign their own gi number to a sequence), the new system is designed to ensure consistency. It is also designed to show a relationship between a sequence identification number and the accession number of the record in which it is found. In contrast, GI numbers are assigned consecutively and bear no resemblance to the accession number. Finally, the new system allows the assignment of alphanumeric protein IDs to proteins translations within nucleotide sequence records. The protein IDs contain three letters followed by five digits, a period, and a version number.</p>
|
|
|
|
<p>As of December 1999 (GenBank release 115.0):<p>
|
|
<ul>
|
|
<li>the NID field and /db_xref="PID:xxxxxxx" qualifer have been removed, and both are now simply shown as "GI" numbers</li>
|
|
<li>the VERSION field of nucleotide records will continue to contain both an accession.version and a GI number for the nucleotide sequence</li>
|
|
<li>each amino acid translation will continue to be labeled with an accession.version sequence identifier (in the "/protein_id" field) and a GI number (in the "/db_xref=GI:xxxxxxx" qualifier), under the CDS feature of a GenBank record</li>
|
|
<li>the accession.version and GI systems of sequence identifiers will run in parellel to each other. Therefore, when any change is made to a sequence, it receives a new GI number AND an increase to its version number.</li>
|
|
</ul>
|
|
|
|
<p>For more information, see section 3.4.7 of the <a href="ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt">current GenBank release notes</a>.</p>
|
|
|
|
<hr>
|
|
|
|
<p>Back to <a href="samplerecord.html">sample record</a>.</p>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<br>
|
|
|
|
<!-- ============END TABLE STYLE FOR SUMMARY PAGES================= -->
|
|
|
|
|
|
<!-- ============PLACE_EXTRA_TITLE_BARS_ABOVE_HERE================ -->
|
|
</td>
|
|
|
|
</tr>
|
|
</table>
|
|
<p>
|
|
|
|
<!-- ===================END_OF_CONTENT============================ -->
|
|
|
|
<table BORDER=0 CELLSPACING=0 CELLPADDING=3 WIDTH="100%" BGCOLOR="#003366" >
|
|
<tr ALIGN=CENTER>
|
|
|
|
<td WIDTH="20%"><a href="mailto:info@ncbi.nlm.nih.gov" class="BAR">Help Desk</a></td>
|
|
|
|
<td WIDTH="20%"><a href="http://www.ncbi.nlm.nih.gov" class="BAR">NCBI</a></td>
|
|
|
|
<td WIDTH="20%"><a href="http://www.nlm.nih.gov" class="BAR">NLM</a></td>
|
|
|
|
<td WIDTH="20%"><a href="http://www.nih.gov" class="BAR">NIH</a></td>
|
|
|
|
<td WIDTH="20%"><a href="credits.html" class="BAR">Credits</a></td>
|
|
|
|
</tr>
|
|
</table>
|
|
|
|
<table BORDER=0 CELLSPACING=0 CELLPADDING=4 WIDTH="600">
|
|
<tr>
|
|
<td width="145"> </td>
|
|
<td width="455"><i><FONT size="2"> Revised June 14, 2004</FONT></i><br>
|
|
<FONT size="2"><i>Questions about NCBI resources to</i> <a href="mailto:info@ncbi.nlm.nih.gov">info@ncbi.nlm.nih.gov</a></FONT><br>
|
|
<FONT size="2"><i>Comments about site map to Renata Geer</i> <a href="mailto:renata@ncbi.nlm.nih.gov">renata@ncbi.nlm.nih.gov</a></FONT></p>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
|
|
</body>
|
|
</html>
|