247 lines
No EOL
16 KiB
HTML
247 lines
No EOL
16 KiB
HTML
<html>
|
|
|
|
|
|
<head>
|
|
<meta http-equiv="content-type" content="text/html;charset=iso-8859-1">
|
|
<meta name="generator">
|
|
<title>NCBI News | Fall/Winter 2000</title>
|
|
<style type="text/css">
|
|
<!--
|
|
a:hover { color: 993300; }
|
|
-->
|
|
</style>
|
|
</head>
|
|
|
|
|
|
<body background="images/bckgrnd.gif" bgcolor="white" link="#003399" alink="#003399" vlink="#003399" text="black">
|
|
<span class="heads"></span> <span class="subheads"></span>
|
|
<table border="0" cellpadding="0" cellspacing="0" width="673" valign="left">
|
|
<tr height="176">
|
|
|
|
<td height="176" colspan="2" valign="left" align="left"><img height="12" width="8" src="images/dotclear.gif"><img height="171" width="173" src="images/logo.gif" alt="NCBI Logo"></td>
|
|
|
|
<td height="176" valign="top" width="10" align="left"></td>
|
|
|
|
<td width="475" height="176" valign="top"><img height="80" width="364" src="images/msthd1.gif" border="0" alt="NCBI News" usemap="#E"><map name="E"><area href="http://www.ncbi.nlm.nih.gov/About/newsletter.html" coords="1,16,362,71" shape="rect"></map><br>
|
|
<img height="80" width="340" src="images/msthd1a.gif" border="0" alt="National Center for Biotechnology Information" usemap="#NCBI"><map name="NCBI"><area href="http://www.nih.gov" coords="0,63,133,77" shape="rect"><area href="http://www.nlm.nih.gov" coords="0,41,138,53" shape="rect"><area href="http://www.ncbi.nlm.nih.gov" coords="0,14,248,26" shape="rect"></map>
|
|
<img height="80" width="114" src="images/edition.gif" alt="Summer 2000"></td>
|
|
</tr>
|
|
<tr valign="top">
|
|
|
|
<td width="13" align="left" valign="top"><img height="1" width="1" src="images/dotclear.gif"></td>
|
|
|
|
<td width="160" align="left" valign="top"><font size="2" face="Arial,Helvetica,sans-serif"><br>
|
|
<br>
|
|
<img height="33" width="178" src="images/issue.gif" alt="In this issue"><br>
|
|
<br>
|
|
<b><font face="Arial, Helvetica, sans-serif" color="003399"><font color="#000000">The
|
|
Human <br>
|
|
Genome Sequence</font></font></b></font>
|
|
<p><b><font face="Arial, Helvetica, sans-serif" size="2" color="003399"><a href="blink.html">BLink
|
|
Enhances<br>
|
|
Entrez Exploration</a></font></b></p>
|
|
<p><font face="Arial, Helvetica, sans-serif" size="2" color="003399"><b><a href="nomenclature.html">Human
|
|
Gene<br>
|
|
Nomenclature</a></b></font></p>
|
|
<p><font face="Arial, Helvetica, sans-serif" size="2" color="003399"><b><a href="faqs.html">FAQs</a></b></font></p>
|
|
<p><font face="Arial, Helvetica, sans-serif" size="2" color="003399"><b><a href="pubs.html">Recent
|
|
Publications</a></b></font></p>
|
|
<p><font face="Arial, Helvetica, sans-serif" size="2" color="003399"><b><a href="standalone.html">Standalone
|
|
<br>
|
|
BLAST Additions</a></b></font></p>
|
|
<p><font face="Arial, Helvetica, sans-serif" size="2" color="003399"><b><a href="blastlab.html">BLAST
|
|
Lab</a></b></font></p>
|
|
<p><font face="Arial, Helvetica, sans-serif" size="2" color="003399"><b><a href="mirrorftp.html">Mirror
|
|
FTP Site<br>
|
|
for GenBank</a></b></font></p>
|
|
<p><font face="Arial, Helvetica, sans-serif" size="2" color="003399"><b><a href="masthead.html"><span class="subheads"><span class="subheads">Ma<span class="heads">sthead</span></span></span></a>
|
|
</b></font><b><font face="Arial, Helvetica, sans-serif" size="2" color="003399">
|
|
</font></b>
|
|
</td>
|
|
|
|
<td width="10" valign="left"> </td>
|
|
<td width="475">
|
|
<div valign="left">
|
|
<p><br>
|
|
<br>
|
|
<font face="Arial, Helvetica, sans-serif" size="3" color="003399"><b>The
|
|
Human Genome Sequence: NCBI’s First Annotated Edition</b></font></p>
|
|
<p><font face="Times New Roman, Times, serif" size="3" color="#000000"><br>
|
|
The NCBI recently released its first assembled and annotated view of
|
|
the human genome sequence. The assembly is based not only on the finished
|
|
and draft sequence deposited in GenBank by the public sequencing centers‚
|
|
but also on the thousands of sequences contributed to GenBank over the
|
|
years by individual scientists around the world. Hence‚ this resource
|
|
represents a true international public effort to sequence the human
|
|
genome.</font></p>
|
|
<p><font face="Times New Roman, Times, serif" size="3" color="#000000">Updated
|
|
assemblies—incorporating new data‚ filling in existing gaps
|
|
and increasing overall accuracy—will be released to the public
|
|
on a regular basis. The human genome data can be viewed on the Web with
|
|
NCBI’s human genome Map Viewer or downloaded in bulk via FTP.</font></p>
|
|
<p><font face="Times New Roman, Times, serif" size="3" color="#000000"><br>
|
|
<b><font face="Arial, Helvetica, sans-serif" color="003399" size="2">Assembly</font></b></font></p>
|
|
<p><font face="Times New Roman, Times, serif" size="3" color="#000000">
|
|
NCBI’s assembly process starts with the entire complement of human
|
|
genomic sequence in GenBank, both draft and finished. Assembling and
|
|
ordering the individual sequence units is a critical phase of the Human
|
|
Genome Project. It involves many different steps‚ including screening
|
|
for vector and other sequence contamination‚ before merging the
|
|
input data into ordered segments of DNA referred to as contigs. This
|
|
first build presents more than 6‚000 contigs, representing roughly
|
|
2.8 billion base pairs. Nearly 700 contigs are longer than 1 MB. Over
|
|
75 percent of the bases in the contigs are in unbroken segments of greater
|
|
than 30Kb‚ the size of a typical human gene.<br>
|
|
<br>
|
|
<br>
|
|
</font><font face="Times New Roman, Times, serif" size="3" color="#000000">
|
|
</font><font face="Times New Roman, Times, serif" size="3" color="#003399"><i>
|
|
</i></font></p>
|
|
<table width="100%" border="0" cellspacing="0" cellpadding="1" bgcolor="#FFFFFF">
|
|
<tr bgcolor="003399">
|
|
<td>
|
|
<table border="0" cellspacing="0" cellpadding="11" bgcolor="#FFFFFF" width="100%">
|
|
<tr bordercolor="003399">
|
|
<td><font face="Arial, Helvetica, sans-serif" size="3" color="003399"><b><font size="2">Model
|
|
Sequences</font></b></font><font size="2"><b><font face="Arial, Helvetica, sans-serif" color="003399">
|
|
Get New Accession Numbers</font></b></font><b><font face="Arial, Helvetica, sans-serif" size="3" color="003399"><br>
|
|
<br>
|
|
</font></b><font face="Times New Roman, Times, serif" size="3" color="#000000">The
|
|
NCBI assembly process produces a new kind of sequence record
|
|
termed a “model sequence.” Model mRNA records are
|
|
created <i>de novo</i> from human genomic sequence, and aligned
|
|
to mRNA reference sequences from RefSeq. Since such alignments
|
|
may contain some mismatches, model sequences are assigned
|
|
their own accession numbers, in the format XM_12345 for mRNA
|
|
and XP_12345 for the corresponding model protein sequence.<br>
|
|
<br>
|
|
The alignment-based evidence for the model sequences is provided
|
|
through AceView, a new service currently accessed from LocusLink
|
|
and the Map Viewer. AceView shows a predicted gene, its intron/exon
|
|
structure, and its alignment to the corresponding RefSeq mRNA
|
|
sequence.</font><font face="Times New Roman, Times, serif" size="3" color="#000000"><br>
|
|
</font><font size="2" face="Times New Roman, Times, serif"></font></td>
|
|
</tr>
|
|
</table>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
<p><font face="Times New Roman, Times, serif" size="3" color="#000000">
|
|
<br>
|
|
<b><font face="Arial, Helvetica, sans-serif" color="003399" size="2">Annotation</font></b></font></p>
|
|
<p><font face="Times New Roman, Times, serif" size="3" color="#000000">NCBI
|
|
is also engaged in the essential process of annotating, or labeling
|
|
the biologically important areas‚ of the human genomic sequence.
|
|
Human gene annotation falls into two major tasks: the correct placement
|
|
of known human genes into their proper genomic context; and the prediction
|
|
of new‚ previously unknown genes‚ from the genomic sequence.
|
|
</font></p>
|
|
<p><font face="Times New Roman, Times, serif" size="3" color="#000000">For
|
|
the first task, the mRNAs from the NCBI RefSeq collection are placed
|
|
on the genome primarily by alignment‚ with compensation for various
|
|
problems in both the genomic and mRNA sequences‚ and reconciliation
|
|
of close paralogs and pseudogenes. In this first release on the NCBI
|
|
Web site‚ 8‚800 of the 10‚500 RefSeq mRNAs were placed
|
|
on the genome.</font></p>
|
|
<p><font face="Times New Roman, Times, serif" size="3" color="#000000">For
|
|
the second task, multiple lines of evidence including EST alignments‚
|
|
splice junctions‚ protein similarities‚ and other methods
|
|
are combined to predict new genes. The predicted mRNAs and proteins
|
|
will be subject to change with improved data and better algorithms.
|
|
Nonetheless, NCBI will do its best to keep the same accession numbers
|
|
with the same predicted genes from build to build. A new release containing
|
|
both known gene placements and predicted gene models was in process
|
|
as this article went to press.</font></p>
|
|
<p><font face="Times New Roman, Times, serif" size="3" color="#000000">Additional
|
|
biological features are also being annotated on the genomic sequence.
|
|
This first release includes more than 1.3 million SNPs and 111‚851
|
|
STS markers.<br>
|
|
</font> </p>
|
|
<p><font face="Times New Roman, Times, serif" size="3" color="#000000"><b><font face="Arial, Helvetica, sans-serif" size="2" color="003399"><br>
|
|
Public Access</font></b></font></p>
|
|
<p><font face="Times New Roman, Times, serif" size="3" color="#000000">NCBI’s
|
|
human genome Map Viewer may be used to view the contigs used to assemble
|
|
the sequence by selecting Contig map. SNP data may be viewed on the
|
|
SNP map. The Map Viewer may be used to further explore the human genome
|
|
data by viewing up to 7 parallel maps selected from a pallet of nineteen—
|
|
including 6 sequence maps‚ 5 cytogenetic maps‚ 2 genetic maps‚
|
|
and 6 radiation hybrid maps.</font></p>
|
|
<p><font face="Times New Roman, Times, serif" size="3" color="#000000">The
|
|
data is also available for downloading from the “genomes/ H_sapiens”
|
|
directory of the NCBI FTP site.</font></p>
|
|
<p><font face="Times New Roman, Times, serif" size="3" color="#000000">The
|
|
FTP site includes the contigs produced by the NCBI assembly‚ RefSeq
|
|
and model mRNA sequences annotated on the genome, and information used
|
|
by the Map Viewer to generate and display the palette of nineteen maps
|
|
mentioned above. </font><font face="Arial, Helvetica, sans-serif" size="2" color="#000000"><i>—DW,
|
|
CB, JO<br>
|
|
<br>
|
|
<br>
|
|
</i></font></p>
|
|
<table width="100%" border="0" cellspacing="0" cellpadding="1" bgcolor="#FFFFFF">
|
|
<tr bgcolor="003399">
|
|
<td>
|
|
<table border="0" cellspacing="0" cellpadding="11" bgcolor="#99CCFF" width="100%">
|
|
<tr bordercolor="003399">
|
|
<td width="100%" height="100%"><font face="Arial, Helvetica, sans-serif" size="3" color="003399"><font size="4" face="Times New Roman, Times, serif"><i><font color="#FFFFFF">What
|
|
is Draft Sequence?</font></i></font><font face="Times New Roman, Times, serif" color="#FFFFFF" size="4"><br>
|
|
</font><font face="Times New Roman, Times, serif" color="#000000">
|
|
<font face="Arial, Helvetica, sans-serif" size="2"><br>
|
|
Two-thirds of the human genomic sequence in GenBank is termed
|
|
“draft” or “unfinished.” These sequences
|
|
can be comprised of many unordered pieces and are of lower
|
|
quality than a typical</font></font></font> <font face="Arial, Helvetica, sans-serif" size="2" color="#000000">“finished”
|
|
GenBank sequence. The finishing process involves closure of
|
|
sequence gaps‚ determination of proper order and orientation,
|
|
and resolution of any sequencing ambiguities and errors. This
|
|
is an ongoing process in the sequencing centers of the Human
|
|
Genome Project‚ and NCBI updates draft sequence on a
|
|
daily basis.<br>
|
|
<br>
|
|
Draft sequence is placed in the HTG (High Throughput Genomic)
|
|
division of GenBank. A typical HTG record consists of all
|
|
sequence data generated from a single cosmid, BAC, YAC, or
|
|
P1 clone. A single accession number is assigned to this collection
|
|
of HTG sequences. Each record includes a clear indication
|
|
of its status—Phase 1 or Phase 2— and a prominent
|
|
warning that the sequence data is “unfinished” and
|
|
may contain errors. Phase 1 indicates an unfinished sequence
|
|
with gaps and unknown order and orientation of the pieces.
|
|
In Phase 2, the order and orientation of the pieces is known,
|
|
but the length of the gaps may still be unknown. Finished
|
|
sequence data‚ consisting of one continuous piece of
|
|
high-quality DNA sequence, is moved out of the HTG division
|
|
and placed in the Mammalian division of GenBank. Contigs from
|
|
the NCBI human genome assembly contain finished as well as
|
|
draft sequence.<br>
|
|
</font><font face="Arial, Helvetica, sans-serif" size="3" color="#000000">
|
|
</font></td>
|
|
</tr>
|
|
</table>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
<p><font face="Times New Roman, Times, serif" size="3" color="#003399"><i>
|
|
<br>
|
|
</i></font><br>
|
|
<font face="Times New Roman, Times, serif" size="3" color="#003399"><i>
|
|
</i></font><font face="Times New Roman, Times, serif" size="3" color="#003399"><i>
|
|
<br>
|
|
</i></font><font face="Times New Roman, Times, serif" size="3" color="#000000">
|
|
</font></p>
|
|
<p> <font face="Times New Roman, Times, serif" size="3" color="#000000">
|
|
</font> </p>
|
|
<p align="right"><a href="blink.html"><img height="27" width="69" src="images/continue.gif" border="0" alt="Continue"></a><br>
|
|
<div align="right"><font color="#003399"> </font></div>
|
|
<font color="#003399">
|
|
<hr noshade size="1" align="right">
|
|
</font>
|
|
<div align="right"><img height="32" width="187" src="images/fallwinter_foot.gif" border="0" alt="NCBI News | Fall/Winter 2000" usemap="#NCBI News foot"><map name="NCBI News foot"><area href="http://www.ncbi.nlm.nih.gov/About/newsletter.html" coords="0,9,196,34" shape="rect"></map><br>
|
|
</div>
|
|
</div>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
</body>
|
|
|
|
</html> |