147 lines
5.6 KiB
HTML
147 lines
5.6 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
<html>
|
|
<head>
|
|
<title>BankIt Submission Help: Feature Table File</title>
|
|
<link rel="stylesheet" href="../../css/bankit.13.6.css" type="text/css">
|
|
<link rel="stylesheet" type="text/css" href="../../css/sp_3_74_ncbi_header.13.6.css">
|
|
<link rel="stylesheet" type="text/css" href="../../css/sp_1_82_layout.13.6.css">
|
|
</head>
|
|
|
|
<body class="help">
|
|
<header id="ncbi_header" class="ncbi-header" role="banner">
|
|
<div class="usa-grid">
|
|
<div class="usa-width-one-whole">
|
|
<div class="ncbi-header__logo">
|
|
<a href="https://www.ncbi.nlm.nih.gov/" class="logo" aria-label="NCBI Logo"
|
|
data-ga-action="click_image" data-ga-label="NIH NLM Logo">
|
|
<img src="https://www.ncbi.nlm.nih.gov/coreutils/nwds/img/logos/AgencyLogo.svg"
|
|
alt="NIH NLM Logo">
|
|
</a>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
</header>
|
|
<h1>BankIt Submission Help: Feature Table File</h1>
|
|
<div class="border1"><p>BankIt accepts features as a five-column, tab-delimited
|
|
table file. The feature table specifies the location and type of each feature,
|
|
and BankIt processes the feature intervals and translates any CDS features into
|
|
proteins.</p>
|
|
<p>The feature table format allows different kinds of features (e.g., gene,
|
|
mRNA, coding region, tRNA) and qualifiers (e.g., /product, /note) to be
|
|
annotated. The valid <a href="http://www.insdc.org/documents/feature_table.html#7.2">features</a>
|
|
and <a href="http://www.insdc.org/documents/feature_table.html#7.3.1">qualifiers</a>
|
|
are restricted to those approved by the International Nucleotide Sequence Database Collaboration.</p>
|
|
</div>
|
|
|
|
<h2>Preparing the Feature Table File</h2>
|
|
|
|
<div class="border1"><p>The first line of the feature table contains the
|
|
following basic information<br/><br/>
|
|
>Feature Sequence_ID
|
|
<BR/><br/>
|
|
The sequence identifier (Sequence_ID) must match the label used to identify each
|
|
table's corresponding sequence in the <a href="fasta.html">nucleotide FASTA file</a>. <br/>
|
|
Subsequent lines of the table list the features.
|
|
</p>
|
|
|
|
<p>Prepare the feature table file in a text editor and save it as <strong>plain ascii
|
|
text (not .rtf or .doc)</strong></p>
|
|
|
|
<p>Format for a feature table:
|
|
<ul>
|
|
<li>Each feature is shown on a separate line.</li>
|
|
<br />
|
|
<li>Multiple nucleotide intervals for a feature are on subsequent lines.</li>
|
|
<br />
|
|
<li>Qualifier(s) describing a feature are on the line(s) below that feature and its intervals.</li>
|
|
<br />
|
|
<li>Each column is separated by a <strong>tab.</strong></li>
|
|
</ul>
|
|
|
|
As shown in the examples below: <br/><br/>
|
|
Line 1 <br/>
|
|
Column 1: Start location (first nucleotide) of a feature<br/>
|
|
Column 2: Stop location (last nucleotide) of a feature<br/>
|
|
Column 3: Feature name (for example, 'CDS' or 'mRNA' or 'rRNA' or 'gene' or
|
|
'exon')<br/><br/>
|
|
Line2:<br/>
|
|
Column 4: Qualifier name (for example, 'product' or 'number' or 'gene' or 'note')<br/>
|
|
Column 5: Qualifier value<br/>
|
|
</p>
|
|
|
|
<p>Note in the examples below that 'gene' is both a Feature and a
|
|
Qualifier and must be entered in two separate columns.
|
|
</p>
|
|
|
|
<p>The examples below show sample tables and illustrates a number of points
|
|
about the table format.
|
|
</p>
|
|
|
|
<dl>
|
|
<dd><pre>
|
|
>Feature Seq1
|
|
<1 >1050 gene
|
|
gene ATH1
|
|
<1 1009 CDS
|
|
product acid trehalase
|
|
product Athlp
|
|
codon_start 2
|
|
<1 >1050 mRNA
|
|
product acid trehalase
|
|
|
|
>Feature Seq2
|
|
2626 2590 tRNA
|
|
2570 2535
|
|
product tRNA-Phe
|
|
|
|
>Feature Seq3
|
|
1080 1210 CDS
|
|
1275 1315
|
|
product actin
|
|
note alternatively spliced
|
|
1055 1210 mRNA
|
|
1275 1340
|
|
product actin
|
|
1055 1340 gene
|
|
gene ACT
|
|
1055 1079 5'UTR
|
|
1316 1340 3'UTR
|
|
</pre>
|
|
</dd>
|
|
</dl>
|
|
|
|
<ul>
|
|
<li>Features that are on complementary strand, such as the tRNA-Phe, are indicated by reversing the interval locations.</li>
|
|
<br/>
|
|
<li>Locations of partial(incomplete) features are indicated with a ">" or
|
|
"<" next to the number. In the Seq1 example, the gene, CDS and mRNA all
|
|
begin upstream of the start of the nucleotide sequence.
|
|
The "<" symbol indicates that they are 5' partial features and the ">" symbol
|
|
indicates that the gene and mRNA are 3' partial.
|
|
Furthermore, for the protein to translate correctly, the correct reading frame
|
|
must be indicated with the qualifer "codon_start" on the CDS. There is no need
|
|
to indicate codon_start on complete CDSs, as it is assumed that the translation
|
|
starts at the first nucleotide of the interval if no codon_start is provided.
|
|
</li>
|
|
<br/>
|
|
<li>If a feature contains multiple intervals, like the spliced tRNA-Phe, each
|
|
interval is listed on a separate line by its start and stop position before
|
|
subsequent qualifier lines.</li>
|
|
<br/>
|
|
<li>Gene features are always a single interval, and their location should cover
|
|
the intervals of all the relevant features (for example: CDS plus 5'UTR plus 3'UTR).</li>
|
|
<br/>
|
|
<li>If a protein has more than one name, each can be listed in the table as a
|
|
separate product qualifier on the CDS in the table. The value of the first
|
|
product qualifier will become the /product on the CDS in the flatfile, and any
|
|
additional product qualifiers will be shown as a /note on the CDS in the
|
|
flatfile. All CDS features must have atleast one product.</li>
|
|
<br/>
|
|
<li>A flatfile /note can be added to any feature using the qualifier note in the
|
|
table</li>
|
|
</ul>
|
|
|
|
</div>
|
|
|
|
</body>
|
|
</html>
|