810 lines
33 KiB
HTML
810 lines
33 KiB
HTML
<!DOCTYPE html
|
|
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
|
|
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
|
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
|
|
<head>
|
|
<title>Annotation Examples</title>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
|
|
<link href="https://www.ncbi.nlm.nih.gov/coreweb/styles/header_footer.css" rel="stylesheet" type="text/css" />
|
|
<link href="../css/bankit.13.6.css" rel="stylesheet" type="text/css" title="default" />
|
|
<link href="../css/tabmenu.13.6.css" rel="stylesheet" type="text/css" />
|
|
<link href="../css/scrolltable.13.6.css" rel="stylesheet" type="text/css"
|
|
title="default" />
|
|
<link rel="stylesheet" id="sugstyle"
|
|
href="https://www.ncbi.nlm.nih.gov/coreweb/styles/ncbisuggest.css"
|
|
type="text/css" />
|
|
</head>
|
|
<body>
|
|
<!-- Header -->
|
|
<!-- accesskey is "3" because that's what the NCBI home page uses -->
|
|
<a accesskey="3" href="#maincontent" class="skipnav"
|
|
tabindex="0" title="Skip to the content">
|
|
Skip to main content</a>
|
|
<div id="head">
|
|
<dl id="logo">
|
|
<dt><a href="/" title="NCBI Home">National Center for Biotechnology Information</a></dt>
|
|
<dd>
|
|
<ul>
|
|
<li><a href="#">Home</a></li>
|
|
<li><a href="#">Search</a></li>
|
|
<li class="lastone"><a href="#">Site Map</a></li>
|
|
</ul>
|
|
</dd>
|
|
</dl>
|
|
<p id="project"><!-- BANNER --> </p>
|
|
<div id="nav"> </div>
|
|
</div>
|
|
<!-- end of Header -->
|
|
|
|
<!-- Content -->
|
|
<div id="maincontent">
|
|
<a name="top"></a>
|
|
<h1>Annotation Examples</h1>
|
|
<ul>
|
|
<li><a href="#mrna">mRNA sequence</a></li>
|
|
<li><a href="#prok">Prokaryotic gene</a></li>
|
|
<li><a href="#euka">Eukaryotic gene</a></li>
|
|
<li><a href="#prom">Promoter region</a></li>
|
|
<li><a href="#viral">Viral sequence</a></li>
|
|
<li><a href="#hiv1">HIV-1</a></li>
|
|
<li><a href="#trans">Transposon or insertion sequence</a></li>
|
|
<li><a href="#micro">Microsatellite sequence</a></li>
|
|
<li><a href="#repeat">Repeat regions</a></li>
|
|
<li><a href="#pseudo">Pseudogene</a></li>
|
|
<li><a href="#fusion">Translocation and/or fusion protein</a></li>
|
|
<li><a href="#clone">Cloning vector</a></li>
|
|
<li><a href="#gapseq">Gapped sequence</a></li>
|
|
<li><a href="#popset">Phylogenetic or population set</a></li>
|
|
<li><a href="#est">EST submissions</a></li>
|
|
<li><a href="#gss">GSS submissions</a></li>
|
|
<li><a href="#sts">STS submissions</a></li>
|
|
<li><a href="#htgs">HTGS submissions</a></li>
|
|
<li><a href="#flics">FLICs submissions</a></li>
|
|
</ul>
|
|
<a name="mrna"></a>
|
|
<div>
|
|
<h2>mRNA sequence</h2>
|
|
<div style="float:right"><a href="#top">Top</a></div>
|
|
<strong>Relevant feature information for a mRNA (cDNA) sequence encoding a protein:</strong>
|
|
<ul>
|
|
<li>coding region intervals, including start and stop codons</li>
|
|
<li>protein name</li>
|
|
<li>gene name, if available</li>
|
|
<li>amino acid sequence, if available</li>
|
|
</ul>
|
|
<br/>
|
|
We strongly suggest that you provide as much of the above information
|
|
as possible to ensure the most complete annotation of your sequence.
|
|
If any of this information is not known, please inform us.<br/><br/>
|
|
<h3>Example:</h3>
|
|
<p>
|
|
<pre>Homo sapiens prolidase (PEPD) mRNA, complete cds.
|
|
|
|
source 1..1888
|
|
/organism="Homo sapiens"
|
|
/chromosome="19"
|
|
/map="19q12-q13.2"
|
|
/cell_type="fibroblasts"
|
|
|
|
gene 1..1888
|
|
/gene="PEPD"
|
|
|
|
CDS 17..1498
|
|
/gene="PEPD"
|
|
/EC_number="3.4.13.9"
|
|
/note="imidodipeptidase"
|
|
/product="prolidase"
|
|
|
|
</pre>
|
|
</p>
|
|
</div>
|
|
<a name="prok"></a>
|
|
<div>
|
|
<h2>Prokaryotic gene</h2>
|
|
<div style="float:right"><a href="#top">Top</a></div>
|
|
<strong>Relevant feature information for a prokaryotic genomic sequence encoding a protein:</strong>
|
|
<ul>
|
|
<li>coding region intervals, including start and stop codons, if present</li>
|
|
<li>protein name</li>
|
|
<li>gene name, if known</li>
|
|
<li>amino acid sequence, if known</li>
|
|
</ul><br/>
|
|
We strongly suggest that you provide as much of the above information
|
|
as possible to ensure the most complete annotation of your sequence.
|
|
If any of this information is not known, please inform us.<br/><br/>
|
|
<h3>Example:</h3>
|
|
<p>
|
|
<pre>Escherichia coli RecA protein (recA) gene, complete cds.
|
|
|
|
source 1..3300
|
|
/organism="Escherichia coli"
|
|
/strain="K-12"
|
|
|
|
gene 783..1961
|
|
/gene="recA"
|
|
|
|
CDS 783..1961
|
|
/gene="recA"
|
|
/function="DNA repair protein"
|
|
/product="RecA protein"
|
|
</pre>
|
|
</p>
|
|
</div>
|
|
<a name="euka"></a>
|
|
<div>
|
|
<h2>Eukaryotic gene</h2>
|
|
<div style="float:right"><a href="#top">Top</a></div>
|
|
<strong>Relevant feature information for a eukaryotic genomic sequence encoding a protein:</strong>
|
|
<ul>
|
|
<li>coding region intervals, including start and stop codons, if
|
|
present, and all exon intervals</li>
|
|
<li>protein name</li>
|
|
<li>gene name, if known</li>
|
|
<li>amino acid sequence, if known</li>
|
|
</ul><br/>
|
|
We strongly suggest that you provide as much of the above information
|
|
as possible to ensure the most complete annotation of your sequence.
|
|
If any of this information is not known, please inform us.<br/><br/>
|
|
<h3>Example:</h3>
|
|
<p><pre>Caenorhabditis elegans tyrosine kinase PTK-2 (ptk-2) gene, complete cds.
|
|
|
|
source 1..3180
|
|
/organism="Caenorhabditis elegans"
|
|
|
|
gene 211..3011
|
|
/gene="ptk-2"
|
|
|
|
mRNA join(211..288,533..703,763..890,940..1024,
|
|
1084..1380,1838..1962,2018..2099,2301..3011)
|
|
/gene="ptk-2"
|
|
/product="protein kinase PTK-2"
|
|
|
|
CDS join(250..288,533..703,763..890,940..1024,
|
|
1084..1380,1838..1962,2018..2099,2301..2456)
|
|
/gene="ptk-2"
|
|
/product="protein kinase PTK-2"
|
|
</pre>
|
|
</p>
|
|
</div>
|
|
<a name="prom"></a>
|
|
<div>
|
|
<h2>Promoter region</h2>
|
|
<div style="float:right"><a href="#top">Top</a></div>
|
|
<strong>Relevant feature information for promoter, genomic 5' flanking sequence, or genomic 3' flanking sequence:</strong>
|
|
<ul>
|
|
<li>protein or gene name for the sequence to which the promoter or
|
|
flanking region belongs</li>
|
|
<li>intervals of any transcribed regions or coding regions, if present
|
|
on the sequence</li>
|
|
</ul>
|
|
<br />
|
|
We strongly suggest that you provide as much of the above information
|
|
as possible to ensure the most complete annotation of your sequence.
|
|
If any of this information is not known, please inform us.<br/><br/>
|
|
<h3>Example:</h3>
|
|
<p><pre>Homo sapiens enhancer-binding protein 2 (EBP2) gene, promoter region and partial cds.
|
|
|
|
source 1..3061
|
|
/organism="Homo sapiens"
|
|
/chromosome="15"
|
|
/map="15q13"
|
|
/cell_line="H441"
|
|
/tissue_type="lung"
|
|
|
|
gene 1..>3061
|
|
/gene="EBP2"
|
|
|
|
promoter 1..2947
|
|
/gene="EBP2"
|
|
|
|
TATA_signal 2918..2923
|
|
/gene="EBP2"
|
|
|
|
mRNA 2948..>3061
|
|
/gene="EBP2"
|
|
/product="enhancer-binding protein 2"
|
|
|
|
5'UTR 2948..3010
|
|
/gene="EBP2"
|
|
|
|
CDS 3011..>3061
|
|
/gene="EBP2"
|
|
/product="enhancer-binding protein 2"
|
|
</pre>
|
|
</p>
|
|
</div>
|
|
<a name="viral"></a>
|
|
<div>
|
|
<h2>Viral sequence</h2>
|
|
<div style="float:right"><a href="#top">Top</a></div>
|
|
<strong>Relevant feature information for a viral sequence:</strong>
|
|
<ul>
|
|
<li>include strain, serotype, host, country, and collection_date when known</li>
|
|
<li>coding region intervals, including start and stop codons, if present</li>
|
|
<li>protein name</li>
|
|
<li>gene name, if known</li>
|
|
<li>amino acid sequence, if known</li>
|
|
<br/><br/>
|
|
<li>if no coding region is present, other description of the sequence</li>
|
|
</ul><br/>
|
|
We strongly suggest that you provide as much of the above information
|
|
as possible to ensure the most complete annotation of your sequence.
|
|
If any of this information is not known, please inform us.<br/><br/>
|
|
|
|
<h3>Example:</h3>
|
|
<p><pre>Human adenovirus 3 strain RKI-4263/07 hexon (H) gene, partial cds.
|
|
|
|
source 1..1520
|
|
/organism="Human adenovirus 3"
|
|
/mol_type="genomic DNA"
|
|
/strain="RKI-4263/07"
|
|
/serotype="3"
|
|
/host="Homo sapiens"
|
|
/db_xref="taxon:45659"
|
|
/country="Germany"
|
|
/collection_date="Apr-2007"
|
|
|
|
gene <1..>1520
|
|
/gene="H"
|
|
|
|
CDS <1..>1520
|
|
/note="major capsid protein"
|
|
/codon_start=1
|
|
/product="hexon"
|
|
</pre>
|
|
</p>
|
|
</div>
|
|
|
|
<a name="hiv1"></a>
|
|
<div>
|
|
<h2>HIV-1</h2>
|
|
<div style="float:right"><a href="#top">Top</a></div>
|
|
<strong>Relevant feature information for an HIV-1 sequence:</strong><br/>
|
|
<ul>
|
|
<li>name of the country from which the virus was isolated</li>
|
|
<li>clone and isolate information</li>
|
|
<br/><br/>AND
|
|
<br/><br/>
|
|
<li>coding region intervals, including start and stop codons, if
|
|
present</li>
|
|
<li>protein names</li>
|
|
<li>gene names, if known</li>
|
|
<li>amino acid sequences, if known</li>
|
|
<br/><br/>
|
|
OR
|
|
<br/><br/>
|
|
<li>if no coding region is present, other description of the sequence</li>
|
|
</ul><br/>
|
|
We strongly suggest that you provide as much of the above information
|
|
as possible to ensure the most complete annotation of your sequence.
|
|
If any of this information is not known, please inform us.<br/><br/>
|
|
|
|
<h3>Example:</h3>
|
|
<p CLASS="TEXT"><pre>HIV-1 isolate X clone 5601 from USA, complete genome.
|
|
|
|
source 1..9720
|
|
/organism="Human immunodeficiency virus type 1"
|
|
/clone="5601"
|
|
/isolate="X"
|
|
/country="USA"
|
|
|
|
repeat_region 1..634
|
|
/rpt_type=long_terminal_repeat
|
|
|
|
gene 789..2291
|
|
/gene="gag"
|
|
|
|
CDS 789..2291
|
|
/gene="gag"
|
|
/product="gag protein"
|
|
|
|
gene 2084..5095
|
|
/gene="pol"
|
|
|
|
CDS 2084..5095
|
|
/gene="pol"
|
|
/product="pol protein"
|
|
|
|
gene 5040..5618
|
|
/gene="vif"
|
|
|
|
CDS 5040..5618
|
|
/gene="vif"
|
|
/product="vif protein"
|
|
|
|
gene 5558..5848
|
|
/gene="vpr"
|
|
|
|
CDS 5558..5848
|
|
/gene="vpr"
|
|
/product="vpr protein"
|
|
|
|
gene 5829..8476
|
|
/gene="tat"
|
|
|
|
CDS join(5829..6043,8386..8476)
|
|
/gene="tat"
|
|
/product="tat protein"
|
|
|
|
gene 5968..8660
|
|
/gene="rev"
|
|
|
|
CDS join(5968..6043,8386..8660)
|
|
/gene="rev"
|
|
/product="rev protein"
|
|
|
|
gene 6060..6305
|
|
/gene="vpu"
|
|
|
|
CDS 6060..6305
|
|
/gene="vpu"
|
|
/product="vpu protein"
|
|
|
|
gene 6223..8802
|
|
/gene="env"
|
|
/pseudo
|
|
|
|
gene 8804..9070
|
|
/gene="nef"
|
|
|
|
CDS 8804..9070
|
|
/gene="nef"
|
|
/product="nef protein"
|
|
|
|
repeat_region 9086..9719
|
|
/rpt_type=long_terminal_repeat
|
|
|
|
polyA_signal 9612..9617
|
|
|
|
</pre>
|
|
</p>
|
|
</div>
|
|
<a name="trans"></a>
|
|
<div>
|
|
<h2>Transposon or insertion sequence</h2>
|
|
<div style="float:right"><a href="#top">Top</a></div>
|
|
<strong>Relevant feature information for transposons or insertion sequences:</strong>
|
|
<ul>
|
|
<li>specific name of the transposon or IS, if available</li>
|
|
<li>nucleotide spans corresponding to the transposon/IS</li>
|
|
</ul>
|
|
<strong>Optional:</strong>
|
|
<ul>
|
|
<li>name and nucleotide intervals of any host gene/product disrupted
|
|
by the transposon/IS</li>
|
|
<li>name and nucleotide intervals of any gene/product in the
|
|
transposon/IS (eg, transposase)</li>
|
|
<li>nucleotide spans any other features (LTRs, repeat regions) </li>
|
|
</ul><br/>
|
|
We strongly suggest that you provide as much of the above information
|
|
as possible to ensure the most complete annotation of your sequence.
|
|
If any of this information is not known, please inform us.<br/><br/>
|
|
|
|
<h3>Example:</h3>
|
|
<p CLASS="TEXT"><pre>Bacillus subtilis strain RS2 transposon BLT transposase (tnpA) gene, complete cds
|
|
|
|
source 1..1221
|
|
/organism="Bacillus subtilis"
|
|
/strain="RS2"
|
|
|
|
repeat_region 21..1127
|
|
/rpt_type="dispersed"
|
|
/mobile_element="transposon: BLT"
|
|
|
|
repeat_region 21..61
|
|
/rpt_type=inverted
|
|
|
|
gene 128..1034
|
|
/gene="tnpA"
|
|
|
|
CDS 128..1034
|
|
/gene="tnpA"
|
|
/product="transposase"
|
|
|
|
repeat_region 1085..1127
|
|
/rpt_type=inverted
|
|
|
|
</pre>
|
|
</p>
|
|
</div>
|
|
<a name="micro"></a>
|
|
<div>
|
|
<h2>Microsatellite sequence</h2>
|
|
<div style="float:right"><a href="#top">Top</a></div>
|
|
<strong>Relevant feature information for a microsatellite sequence:</strong>
|
|
<ul>
|
|
<li>unique microsatellite/clone name for each sequence</li>
|
|
<li>interval of any repeat region(s) within the microsatellite sequence,
|
|
if known</li>
|
|
<li>are these considered <A HREF="https://www.ncbi.nlm.nih.gov/dbSTS/how_to_submit.html">STS sequences</A>?</li>
|
|
</ul><br/>
|
|
We strongly suggest that you provide as much of the above information
|
|
as possible to ensure the most complete annotation of your sequence.
|
|
If any of this information is not known, please inform us.<br/><br/>
|
|
<h3>Example #1:</h3>
|
|
<p CLASS="TEXT"><pre>Chorthippus parallelus clone IIB-G5 microsatellite sequence.
|
|
|
|
source 1..288
|
|
/organism="Chorthippus parallelus"
|
|
/mol_type="genomic DNA"
|
|
/db_xref="taxon:37639"
|
|
/clone="IIB-G5"
|
|
|
|
repeat_region 1..288
|
|
/rpt_type=tandem
|
|
/satellite="microsatellite"
|
|
</pre>
|
|
</p>
|
|
<h3>Example #2:</h3>
|
|
<p CLASS="TEXT"><pre>Noturus exilis voucher KU 40271 microsatellite Noex254 sequence.
|
|
|
|
source 1..556
|
|
/organism="Noturus exilis"
|
|
/mol_type="genomic DNA"
|
|
/specimen_voucher="KU 40271"
|
|
/db_xref="taxon:61323"
|
|
/clone="Noex_02_03_H06"
|
|
/PCR_primers="fwd_seq: catgtttgcacaaagggaaa, rev_seq:
|
|
atgtggatgcagattgtgga"
|
|
|
|
repeat_region 77..100
|
|
/rpt_type=tandem
|
|
/rpt_unit_range=77..100
|
|
/rpt_unit_seq="ca"
|
|
/satellite="microsatellite:Noex254"
|
|
</pre>
|
|
</p>
|
|
</div>
|
|
<a name="repeat"></a>
|
|
<div>
|
|
<h2>Repeat regions</h2>
|
|
<div style="float:right"><a href="#top">Top</a></div>
|
|
<strong>Relevant feature information for sequences containing repeat regions:</strong>
|
|
<ul>
|
|
<li>repeat region intervals</li>
|
|
<li>repeat family, if known (eg, Alu, Mer)</li>
|
|
<li>repeat type (tandem, inverted, flanking, terminal, direct,
|
|
dispersed, nested, long_terminal_repeat,
|
|
non_ltr_retrotransposon_polymeric_tract, centromeric_repeat,
|
|
telomeric_repeat, x_element_combinatorial_repeat, y_prime_element,
|
|
or other)</li>
|
|
<li>repeat unit description/intervals, if region contains more than one
|
|
repeat</li>
|
|
</ul><br/>
|
|
We strongly suggest that you provide as much of the above information
|
|
as possible to ensure the most complete annotation of your sequence.
|
|
If any of this information is not known, please inform us.<br/><br/>
|
|
<h3>Example:</h3>
|
|
<p CLASS="TEXT"><pre>Homo sapiens repeat regions
|
|
|
|
source 1..2050
|
|
/organism="Homo sapiens"
|
|
/chromosome="6"
|
|
/map="6q25"
|
|
|
|
repeat_region 8..126
|
|
/rpt_type=dispersed
|
|
/rpt_family="B2"
|
|
|
|
repeat_region 197..344
|
|
/rpt_type="direct"
|
|
/rpt_unit="197..220"
|
|
|
|
repeat_region 389..673
|
|
/rpt_family="AluSx"
|
|
/rpt_type=dispersed
|
|
|
|
repeat_region 847..876
|
|
/rpt_type="tandem"
|
|
/rpt_unit="ca"
|
|
/satellite="microsatellite:BT21"
|
|
|
|
repeat_region 2000..2050
|
|
/rpt_type=long_terminal_repeat
|
|
|
|
</pre>
|
|
</p>
|
|
</div>
|
|
<a name="pseudo"></a>
|
|
<div>
|
|
<h2>Pseudogene</h2>
|
|
<div style="float:right"><a href="#top">Top</a></div>
|
|
<strong>Relevant feature information for a pseudogene sequence:</strong>
|
|
<ul>
|
|
<li>gene intervals</li>
|
|
<li>gene name</li>
|
|
</ul><br/>
|
|
We strongly suggest that you provide as much of the above information
|
|
as possible to ensure the most complete annotation of your sequence.
|
|
If any of this information is not known, please inform us.<br/><br/>
|
|
<h3>Example:</h3>
|
|
<p CLASS="TEXT"><pre>Mus musculus DNA methyltransferase (Dmt1) pseudogene, complete sequence.
|
|
|
|
source 1..2131
|
|
/organism="Mus musculus"
|
|
/strain="SvJ/129"
|
|
|
|
gene 123..1444
|
|
/gene="Dmt1"
|
|
/note="DNA methyltransferase 1"
|
|
/pseudo
|
|
</pre>
|
|
</p>
|
|
</div>
|
|
<a name="fusion"></a>
|
|
<div>
|
|
<h2>Translocation and/or fusion protein</h2>
|
|
<div style="float:right"><a href="#top">Top</a></div>
|
|
<strong>Relevant feature information for a sequence resulting from a chromosomal translocation:</strong>
|
|
<ul>
|
|
<li>nucleotide location of the translocation breakpoint, if known</li>
|
|
<li>map information for the translocation breakpoint (e.g.,
|
|
t(18;X)(q11.2;p11.2)</li>
|
|
</ul>
|
|
if the translocation results in a fusion protein, please include:
|
|
<ul>
|
|
<li>coding region intervals, including start and stop codons, if
|
|
present</li>
|
|
<li>protein name</li>
|
|
<li>amino acid sequence, if known</li>
|
|
</ul>
|
|
<br/>
|
|
We strongly suggest that you provide as much of the above information
|
|
as possible to ensure the most complete annotation of your sequence.
|
|
If any of this information is not known, please inform us.<br/><br/>
|
|
<h3>Example:</h3>
|
|
<p CLASS="TEXT"><pre>Homo sapiens SYT/SSX4 fusion protein mRNA, complete cds.
|
|
|
|
source 1..2935
|
|
/organism="Homo sapiens"
|
|
/tissue_type="sarcoma"
|
|
/map="t(18;X)(q11.2;p11.2)"
|
|
|
|
source 1..1242
|
|
/organism="Homo sapiens"
|
|
/chromosome="18"
|
|
/map="18q11.2"
|
|
|
|
CDS 1..1479
|
|
/product="SYT/SSX4 fusion protein"
|
|
|
|
source 1243..2935
|
|
/organism="Homo sapiens"
|
|
/chromosome="X"
|
|
/map="Xp11.2"
|
|
|
|
3'UTR 1480..2935
|
|
</pre>
|
|
</p>
|
|
</div>
|
|
<a name="clone"></a>
|
|
<div>
|
|
<h2>Cloning vector</h2>
|
|
<div style="float:right"><a href="#top">Top</a></div>
|
|
<strong>Relevant feature information for a cloning vector</strong>
|
|
<ul>
|
|
<li>unique name for the cloning vector</li>
|
|
</ul>
|
|
<strong>Optional:</strong>
|
|
<ul>
|
|
<li>coding region intervals, including start and stop codons</li>
|
|
<li>protein names, gene names</li>
|
|
</ul><br/>
|
|
We strongly suggest that you provide as much of the above information
|
|
as possible to ensure the most complete annotation of your sequence.
|
|
If any of this information is not known, please inform us.<br/><br/>
|
|
<h3>Example:</h3>
|
|
<p><pre>Cloning vector pRB223, complete sequence
|
|
|
|
source 1..4361
|
|
/organism="Cloning vector pRB223"
|
|
|
|
gene 86..1276
|
|
/gene="tet"
|
|
|
|
CDS 86..1276
|
|
/gene="tet"
|
|
/product="tetracycline resistance protein"
|
|
|
|
RBS 1905..1909
|
|
/note="Shine-Dalgarno sequence"
|
|
|
|
rep_origin 2535
|
|
|
|
gene complement(3293..4194)
|
|
/gene="bla"
|
|
|
|
CDS complement(3293..4153)
|
|
/gene="bla"
|
|
/product="beta-lactamase"
|
|
|
|
misc_feature 4069..4125
|
|
/note="multiple cloning site"
|
|
|
|
RBS complement(4161..4165)
|
|
/gene="bla"
|
|
/note="Shine-Dalgarno sequence"
|
|
|
|
promoter complement(4188..4194)
|
|
/gene="bla"
|
|
</pre>
|
|
</p>
|
|
</div>
|
|
<a name="gapseq"></a>
|
|
<div>
|
|
<h2>Gapped sequence</h2>
|
|
<div style="float:right"><a href="#top">Top</a></div>
|
|
<p> A gapped sequence includes both known, directly sequenced data and
|
|
unknown data. The unknown sections of sequence are represented by strings of
|
|
'nnn' between the known, directly sequenced, contiguous data. All pieces of
|
|
a gapped sequence must be from the same source and be in the same
|
|
orientation and in the correct order. </p>
|
|
<strong>Relevant feature information for a gapped sequence:</strong>
|
|
<ul>
|
|
<li>if a gap length is estimated, insert the equivalent number of nnns between
|
|
the directly determined, contiguous sections of sequence</li>
|
|
<li>if the gap length is unknown, insert a string of 100 nnns to represent the
|
|
gap between the sections of sequence</li>
|
|
<li>add a misc_feature for each gap with a /note qualifier to describe it
|
|
as either 'gap of unknown length' or 'gap of estimated length, # nts'</li>
|
|
<li>add all other appropriate features (exons, introns, CDS, gene, etc)</li>
|
|
</ul><br/>
|
|
We strongly suggest that you provide as much of the above information
|
|
as possible to ensure the most complete annotation of your sequence.
|
|
If any of this information is not known, please inform us.<br/><br/>
|
|
<h3>Example:</h3>
|
|
<p><pre>Homo sapiens MHC class I antigen (HLA-B) gene, HLA-B_458_01445 allele, exons 2, 3 and partial cds.
|
|
|
|
source 1..788
|
|
/organism="Homo sapiens"
|
|
/mol_type="genomic DNA"
|
|
/db_xref="taxon:9606"
|
|
|
|
gene <1..>788
|
|
/gene="HLA-B"
|
|
/allele="HLA-B_458_01445"
|
|
|
|
mRNA join(<1..270,513..>788)
|
|
/gene="HLA-B"
|
|
/allele="HLA-B_458_01445"
|
|
/product="MHC class I antigen"
|
|
|
|
CDS join(<1..270,513..>788)
|
|
/gene="HLA-B"
|
|
/allele="HLA-B_458_01445"
|
|
/codon_start=3
|
|
/product="MHC class I antigen"
|
|
/protein_id="ACR38915.1"
|
|
/db_xref="GI:238055051"
|
|
/translation="SHSMRYFDTAMSRPGRGEPRFISVGYVDDTQFVRFDSDAASPRE
|
|
EPRAPWIEQEGPEYWDRNTQIFKTNTQTDRESLRNLRGYYNQSEAGSHTLQSMYGCDV
|
|
GPDGRLLRGHDQSAYDGKDYIALNEDLRSWTAADTAAQITQRKWEAARVAEQDRAYLE
|
|
GTCVEWLRRYLENGKDTLERA"
|
|
|
|
exon 1..270
|
|
/gene="HLA-B"
|
|
/allele="HLA-B_458_01445"
|
|
/number=2
|
|
|
|
gap 271..512
|
|
/estimated_length=242
|
|
|
|
exon 513..788
|
|
/gene="HLA-B"
|
|
/allele="HLA-B_458_01445"
|
|
/number=3
|
|
</pre>
|
|
</p>
|
|
</div>
|
|
<a name="popset"></a>
|
|
<div>
|
|
<h2>Phylogenetic or population set</h2>
|
|
<div style="float:right"><a href="#top">Top</a></div>
|
|
<strong>Relevant feature information for population or phylogenetic studies:</strong>
|
|
<p>
|
|
A set comprises a group of sequences that represent the same gene or locus
|
|
in different organisms or in different isolates, strains, or clones of the
|
|
same organism. A set can be, for example, phylogenetic (different organisms), population (same organism), or environmental (unclassified or unknown organisms).
|
|
</p>
|
|
<ul>
|
|
<li>unique descriptive information for each sequence (eg, clone, strain,
|
|
isolate, or organism names)</li>
|
|
<li>creating a set will allow the sequences to be retreivable by Entrez <a
|
|
href="https://www.ncbi.nlm.nih.gov/sites/entrez?db=PopSet">PopSet</a>
|
|
as a group.</li>
|
|
</ul><br/>
|
|
We strongly suggest that you provide as much of the above information
|
|
as possible to ensure the most complete annotation of your sequence.
|
|
If any of this information is not known, please inform us.<br/><br/>
|
|
</div>
|
|
|
|
<a name="est"></a>
|
|
<div>
|
|
<h2>EST submissions</h2>
|
|
<div style="float:right"><a href="#top">Top</a></div>
|
|
Please submit directly to <A HREF="https://www.ncbi.nlm.nih.gov/dbEST/how_to_submit.html">dbEST</A>: the EST division of GenBank.
|
|
</div>
|
|
|
|
<a name="gss"></a>
|
|
<div>
|
|
<h2>GSS submissions</h2>
|
|
<div style="float:right"><a href="#top">Top</a></div>
|
|
Please submit directly to <A HREF="https://www.ncbi.nlm.nih.gov/dbGSS/how_to_submit.html">dbGSS</A>: the GSS division of GenBank.
|
|
</div>
|
|
<a name="sts"></a>
|
|
<div>
|
|
<h2>STS submissions</h2>
|
|
<div style="float:right"><a href="#top">Top</a></div>
|
|
<strong>Relevant feature information for STS submissions:</strong>
|
|
<br/>
|
|
<ul>
|
|
<li>submit directly to <A
|
|
HREF="https://www.ncbi.nlm.nih.gov/dbSTS/how_to_submit.html">dbSTS</A>:
|
|
the STS division of GenBank</li>
|
|
</ul>
|
|
OR
|
|
<ul>
|
|
<li>submit using BankIt and provide:</li>
|
|
<ul>
|
|
<li>chromosome and/or specific map locations</li>
|
|
<li>clone name</li>
|
|
<li>clone library [catalog number, reference, lab source, and/or
|
|
specific (in-house) name or number]</li>
|
|
<li>PCR conditions and primer binding sites</li>
|
|
</ul>
|
|
</ul><br/>
|
|
We strongly suggest that you provide as much of the above information
|
|
as possible to ensure the most complete annotation of your sequence.
|
|
If any of this information is not known, please inform us.<br/><br/>
|
|
</div>
|
|
<a name="htgs"></a>
|
|
<div>
|
|
<h2>HTGS submissions</h2>
|
|
<div style="float:right"><a href="#top">Top</a></div>
|
|
<strong>Requirements for HTGs submissions:</strong><br/>
|
|
<ul>
|
|
<li>large genome centers should submit these through an FTP account
|
|
to the <A HREF="https://www.ncbi.nlm.nih.gov/HTGS/">High Throughput Genomic
|
|
(HTG) Sequences division</A> of GenBank</li>
|
|
<li>one time only submitters should submit to gb-sub@ncbi.nlm.nih.gov</li>
|
|
</ul>
|
|
</div>
|
|
<a name="flics"></a>
|
|
<div>
|
|
<h2>FLICs submissions</h2>
|
|
<div style="float:right"><a href="#top">Top</a></div>
|
|
<strong>Relevant feature information for FLIC submissions:</strong>
|
|
<ul>
|
|
<li>explicit labeling as FLICs</li>
|
|
</ul>
|
|
<strong>Optional:</strong>
|
|
<ul>
|
|
<li>protein name</li>
|
|
<li>gene name</li>
|
|
<li>CDS intervals, including start/stop codons</li>
|
|
</ul><br/>
|
|
We strongly suggest that you provide as much of the above information
|
|
as possible to ensure the most complete annotation of your sequence.
|
|
If any of this information is not known, please inform us.<br/><br/>
|
|
</div>
|
|
</div>
|
|
<!-- end of Content -->
|
|
|
|
<!-- Footer -->
|
|
<div id="ncbifooter">
|
|
<div id="footer-contents-right">
|
|
<a href="https://www.nih.gov" title="NIH" class="nih_img_link">NIH</a> <a href="https://www.nlm.nih.gov" title="NLM" class="nlm_img_link">NLM</a> <a href="https://www.hhs.gov" title="DHHS" class="dhhs_img_link">DHHS</a> <a href="https://www.usa.gov" title="USA.gov" class="usagov_img_link">USA.gov</a>
|
|
|
|
</div>
|
|
<div id="footer-contents-left">
|
|
<a href="https://www.ncbi.nlm.nih.gov/About/glance/contact_info.html">Contact</a> | <a href="https://www.ncbi.nlm.nih.gov/About/disclaimer.html">Copyright</a> | <a href="https://www.ncbi.nlm.nih.gov/About/disclaimer.html#disclaimer">Disclaimer</a> | <a href="https://www.nlm.nih.gov/privacy.html">Privacy</a> | <a href="https://www.nlm.nih.gov/accessibility.html">Accessibility</a>
|
|
|
|
<p class="address">
|
|
National Center for Biotechnology Information , US National Library of Medicine<br/>
|
|
8600 Rockville Pike , Bethesda, MD USA 20894
|
|
</p>
|
|
</div>
|
|
</div>
|
|
<!-- end of Footer -->
|
|
</body>
|
|
</html>
|