nih-gov/www.ncbi.nlm.nih.gov/WebSub/html/help/feature-table.html

147 lines
5.6 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>BankIt Submission Help: Feature Table File</title>
<link rel="stylesheet" href="../../css/bankit.13.6.css" type="text/css">
<link rel="stylesheet" type="text/css" href="../../css/sp_3_74_ncbi_header.13.6.css">
<link rel="stylesheet" type="text/css" href="../../css/sp_1_82_layout.13.6.css">
</head>
<body class="help">
<header id="ncbi_header" class="ncbi-header" role="banner">
<div class="usa-grid">
<div class="usa-width-one-whole">
<div class="ncbi-header__logo">
<a href="https://www.ncbi.nlm.nih.gov/" class="logo" aria-label="NCBI Logo"
data-ga-action="click_image" data-ga-label="NIH NLM Logo">
<img src="https://www.ncbi.nlm.nih.gov/coreutils/nwds/img/logos/AgencyLogo.svg"
alt="NIH NLM Logo">
</a>
</div>
</div>
</div>
</header>
<h1>BankIt Submission Help: Feature Table File</h1>
<div class="border1"><p>BankIt accepts features as a five-column, tab-delimited
table file. The feature table specifies the location and type of each feature,
and BankIt processes the feature intervals and translates any CDS features into
proteins.</p>
<p>The feature table format allows different kinds of features (e.g., gene,
mRNA, coding region, tRNA) and qualifiers (e.g., /product, /note) to be
annotated. The valid <a href="http://www.insdc.org/documents/feature_table.html#7.2">features</a>
and <a href="http://www.insdc.org/documents/feature_table.html#7.3.1">qualifiers</a>
are restricted to those approved by the International Nucleotide Sequence Database Collaboration.</p>
</div>
<h2>Preparing the Feature Table File</h2>
<div class="border1"><p>The first line of the feature table contains the
following basic information<br/><br/>
&gt;Feature Sequence_ID
<BR/><br/>
The sequence identifier (Sequence_ID) must match the label used to identify each
table's corresponding sequence in the <a href="fasta.html">nucleotide FASTA file</a>. <br/>
Subsequent lines of the table list the features.
</p>
<p>Prepare the feature table file in a text editor and save it as <strong>plain ascii
text (not .rtf or .doc)</strong></p>
<p>Format for a feature table:
<ul>
<li>Each feature is shown on a separate line.</li>
<br />
<li>Multiple nucleotide intervals for a feature are on subsequent lines.</li>
<br />
<li>Qualifier(s) describing a feature are on the line(s) below that feature and its intervals.</li>
<br />
<li>Each column is separated by a <strong>tab.</strong></li>
</ul>
As shown in the examples below: <br/><br/>
Line 1 <br/>
Column 1: Start location (first nucleotide) of a feature<br/>
Column 2: Stop location (last nucleotide) of a feature<br/>
Column 3: Feature name (for example, 'CDS' or 'mRNA' or 'rRNA' or 'gene' or
'exon')<br/><br/>
Line2:<br/>
Column 4: Qualifier name (for example, 'product' or 'number' or 'gene' or 'note')<br/>
Column 5: Qualifier value<br/>
</p>
<p>Note in the examples below that 'gene' is both a Feature and a
Qualifier and must be entered in two separate columns.
</p>
<p>The examples below show sample tables and illustrates a number of points
about the table format.
</p>
<dl>
<dd><pre>
&gt;Feature Seq1
&lt;1 &gt;1050 gene
gene ATH1
&lt;1 1009 CDS
product acid trehalase
product Athlp
codon_start 2
&lt;1 &gt;1050 mRNA
product acid trehalase
&gt;Feature Seq2
2626 2590 tRNA
2570 2535
product tRNA-Phe
&gt;Feature Seq3
1080 1210 CDS
1275 1315
product actin
note alternatively spliced
1055 1210 mRNA
1275 1340
product actin
1055 1340 gene
gene ACT
1055 1079 5'UTR
1316 1340 3'UTR
</pre>
</dd>
</dl>
<ul>
<li>Features that are on complementary strand, such as the tRNA-Phe, are indicated by reversing the interval locations.</li>
<br/>
<li>Locations of partial(incomplete) features are indicated with a "&gt;" or
"&lt;" next to the number. In the Seq1 example, the gene, CDS and mRNA all
begin upstream of the start of the nucleotide sequence.
The "&lt;" symbol indicates that they are 5' partial features and the "&gt;" symbol
indicates that the gene and mRNA are 3' partial.
Furthermore, for the protein to translate correctly, the correct reading frame
must be indicated with the qualifer "codon_start" on the CDS. There is no need
to indicate codon_start on complete CDSs, as it is assumed that the translation
starts at the first nucleotide of the interval if no codon_start is provided.
</li>
<br/>
<li>If a feature contains multiple intervals, like the spliced tRNA-Phe, each
interval is listed on a separate line by its start and stop position before
subsequent qualifier lines.</li>
<br/>
<li>Gene features are always a single interval, and their location should cover
the intervals of all the relevant features (for example: CDS plus 5'UTR plus 3'UTR).</li>
<br/>
<li>If a protein has more than one name, each can be listed in the table as a
separate product qualifier on the CDS in the table. The value of the first
product qualifier will become the /product on the CDS in the flatfile, and any
additional product qualifiers will be shown as a /note on the CDS in the
flatfile. All CDS features must have atleast one product.</li>
<br/>
<li>A flatfile /note can be added to any feature using the qualifier note in the
table</li>
</ul>
</div>
</body>
</html>