88 lines
5 KiB
HTML
88 lines
5 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
<html>
|
|
<head>
|
|
<title>BankIt Submission Help: Nucleotide FASTA file</title>
|
|
<link rel="stylesheet" href="../../css/bankit.13.6.css" type="text/css">
|
|
<link rel="stylesheet" type="text/css" href="../../css/sp_3_74_ncbi_header.13.6.css">
|
|
<link rel="stylesheet" type="text/css" href="../../css/sp_1_82_layout.13.6.css">
|
|
</head>
|
|
|
|
<body class="help">
|
|
<header id="ncbi_header" class="ncbi-header" role="banner">
|
|
<div class="usa-grid">
|
|
<div class="usa-width-one-whole">
|
|
<div class="ncbi-header__logo">
|
|
<a href="https://www.ncbi.nlm.nih.gov/" class="logo" aria-label="NCBI Logo"
|
|
data-ga-action="click_image" data-ga-label="NIH NLM Logo">
|
|
<img src="https://www.ncbi.nlm.nih.gov/coreutils/nwds/img/logos/AgencyLogo.svg" alt="NIH NLM Logo">
|
|
</a>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
</header>
|
|
|
|
<h1>BankIt Submission Help: Nucleotide FASTA file</h1>
|
|
<h2>Use Plain Text Format:</h2>
|
|
<div class="border1">
|
|
<ul>
|
|
<li>Use a text editor (for example, WordPad) to prepare the FASTA file of nucleotide sequences.</li>
|
|
<li>Be sure to save your file as Plain Text or Text document.</li>
|
|
<li>If you are not sure that the "Save" option in your program does this automatically, use "Save As...". In the "Save as type:" pull-down menu, select "Text Document"</li>
|
|
<li>If using Word, select "Save As.." from the File menu. In the "Save as type:" pull-down menu, select "Plain Text(*.txt)."</li>
|
|
<li>Do <strong>not</strong> save the file as .doc or .rtf (rich text format); BankIt will not allow you to upload a non-plain text file</li>
|
|
</ul>
|
|
|
|
</div>
|
|
|
|
<h2>Content Rules:</h2>
|
|
|
|
<div class="border1">
|
|
<ul>
|
|
<li>Each sequence in the FASTA file contains a Definition Line followed by the sequence data.</li>
|
|
<li>The Definition Line for each sequence begins with a ">" followed by a Sequence_ID (SeqID). The SeqID identifies the same specimen in all the steps of a submission (for example, in the nucleotide FASTA file, in a protein FASTA file, or in a Source Modifier file).</li>
|
|
<li>Each SeqID must be unique within the file</li>
|
|
<li>SeqIDs <strong><em>may not contain spaces</em></strong>.</li>
|
|
<li>SeqIDs may contain only the following characters - letters, digits, hyphens (-), underscores (_), periods (.), colons (:), asterisks (*), and number signs (#)</li>
|
|
<li>SeqIDs must be 25 or fewer characters.</li>
|
|
<li>The SeqID must be separated by a space from the rest of the Definition Line text</li>
|
|
<li>It is recommended that the Definition Line include the organism name. If Organism Names are not input as part of their FASTA Definition Lines, they must be provided in a separate table in a subsequent page of the submission process.</li>
|
|
<li>The Organism Name must be provided in this format: <code><strong>[organism=Organism Name]</strong></code> (square bracket equal sign Organism Name square bracket).</li>
|
|
<li><a href="genbank-source-table.html#modifiers">Source Modifiers</a> provided in the FASTA file Definition Line must follow the same format as Organism Name. Examples: [isolate=mosquito12] [clone=AC3] [strain=BuzzLY]</li>
|
|
<li>Brief, free text description of the sequence may follow the formatted
|
|
Organism Name and Source Modifiers. Examples: 'cytochrome oxidase I, partial CDS' 'trnH-psbA intergenic spacer'</li>
|
|
<li>The FASTA Definition Line <strong><em>may not contain</em></strong> any internal hard returns.</li>
|
|
<li>However, the FASTA Definition Line <strong><em>must be separated</em></strong> from the actual sequence by a hard return.</li>
|
|
</ul>
|
|
|
|
<p>The placement of spaces and hard returns within a FASTA file is critical for the FASTA information and sequence(s) to be read correctly:
|
|
<img src="../defline_magnified.jpg"
|
|
alt="Format of a FASTA definition line showing placement of spaces and hard returns"/>
|
|
</p>
|
|
|
|
<dl>
|
|
<dt>Sample FASTA files showing Definition Lines and sequences</dt>
|
|
<dd><pre class="help">
|
|
<code>>Seq1 [organism=Carpodacus mexicanus] [clone=6b] actin (act) mRNA, partial cds
|
|
CCTTTATCTAATCTTTGGAGCATGAGCTGGCATAGTTGGAACCGCCCTCAGCCTCCTCATCCGTGCAGAA
|
|
TAATAATTTTCTTTATAGTAATACCAATCATGATCGGTGGTTTCGGAAACTGACTAGTCCCACTCATAAT
|
|
<br/>
|
|
>Seq2 [organism=uncultured bacillus sp.] [isolate=A2] corticotropin (CT) gene, complete cds
|
|
GGTAGGTACCGCCCTAAGNCTCCTAATCCGAGCAGAACTANGCCAACCCGGAGCCCTTCTGGGAGACGAC
|
|
TCAACACCACCTTCTTTGACCCAGCAGGAGGAGGAGACCCAGTACTATACCAGCACCTATTCTGATTCTT
|
|
<br/>
|
|
>Seq3 [organism=Phalaenopsis equestris var. leucaspis]
|
|
CCTATACCTAATTTTCGGCGCATGAGCCGGAATGGTGGGTACCGCTCTAAGCCTCCTCATTCGAGCAGAA
|
|
CTAGGCCAACCCGGAGCCCTTCTGGGAGACGACCAAGTCTACAACGTGGTTGTCACGGCCCATGCCTTCG
|
|
<br/>
|
|
>Seq9 [organism=Petunia integrifolia subsp. inflata]
|
|
TAGTTGGAACAGCCCTCAGCCTACTCATCCGAGCAGAACTAGGCCAACCCGGAACCCTCCTGGGAGATGA
|
|
CCAAATCTACAATGTAATCGTCACTGCCCATGCCTTCGTAATAATCTTCTTCATAGTAATACCAGTCATA
|
|
</code></pre></dd>
|
|
</dl>
|
|
|
|
<p><a href="sample_files/nucleotide-sample.txt">Sample nucleotide FASTA</a></p>
|
|
|
|
</div>
|
|
|
|
</body>
|
|
</html>
|