nih-gov/www.ncbi.nlm.nih.gov/WebSub/html/help/fasta.html

88 lines
5 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>BankIt Submission Help: Nucleotide FASTA file</title>
<link rel="stylesheet" href="../../css/bankit.13.6.css" type="text/css">
<link rel="stylesheet" type="text/css" href="../../css/sp_3_74_ncbi_header.13.6.css">
<link rel="stylesheet" type="text/css" href="../../css/sp_1_82_layout.13.6.css">
</head>
<body class="help">
<header id="ncbi_header" class="ncbi-header" role="banner">
<div class="usa-grid">
<div class="usa-width-one-whole">
<div class="ncbi-header__logo">
<a href="https://www.ncbi.nlm.nih.gov/" class="logo" aria-label="NCBI Logo"
data-ga-action="click_image" data-ga-label="NIH NLM Logo">
<img src="https://www.ncbi.nlm.nih.gov/coreutils/nwds/img/logos/AgencyLogo.svg" alt="NIH NLM Logo">
</a>
</div>
</div>
</div>
</header>
<h1>BankIt Submission Help: Nucleotide FASTA file</h1>
<h2>Use Plain Text Format:</h2>
<div class="border1">
<ul>
<li>Use a text editor (for example, WordPad) to prepare the FASTA file of nucleotide sequences.</li>
<li>Be sure to save your file as Plain Text or Text document.</li>
<li>If you are not sure that the "Save" option in your program does this automatically, use "Save As...". In the "Save as type:" pull-down menu, select "Text Document"</li>
<li>If using Word, select "Save As.." from the File menu. In the "Save as type:" pull-down menu, select "Plain Text(*.txt)."</li>
<li>Do <strong>not</strong> save the file as .doc or .rtf (rich text format); BankIt will not allow you to upload a non-plain text file</li>
</ul>
</div>
<h2>Content Rules:</h2>
<div class="border1">
<ul>
<li>Each sequence in the FASTA file contains a Definition Line followed by the sequence data.</li>
<li>The Definition Line for each sequence begins with a "&gt;" followed by a Sequence_ID (SeqID). The SeqID identifies the same specimen in all the steps of a submission (for example, in the nucleotide FASTA file, in a protein FASTA file, or in a Source Modifier file).</li>
<li>Each SeqID must be unique within the file</li>
<li>SeqIDs <strong><em>may not contain spaces</em></strong>.</li>
<li>SeqIDs may contain only the following characters - letters, digits, hyphens (-), underscores (_), periods (.), colons (:), asterisks (*), and number signs (#)</li>
<li>SeqIDs must be 25 or fewer characters.</li>
<li>The SeqID must be separated by a space from the rest of the Definition Line text</li>
<li>It is recommended that the Definition Line include the organism name. If Organism Names are not input as part of their FASTA Definition Lines, they must be provided in a separate table in a subsequent page of the submission process.</li>
<li>The Organism Name must be provided in this format: <code><strong>[organism=Organism Name]</strong></code> (square bracket equal sign Organism Name square bracket).</li>
<li><a href="genbank-source-table.html#modifiers">Source Modifiers</a> provided in the FASTA file Definition Line must follow the same format as Organism Name. Examples: [isolate=mosquito12] [clone=AC3] [strain=BuzzLY]</li>
<li>Brief, free text description of the sequence may follow the formatted
Organism Name and Source Modifiers. Examples: 'cytochrome oxidase I, partial CDS' 'trnH-psbA intergenic spacer'</li>
<li>The FASTA Definition Line <strong><em>may not contain</em></strong> any internal hard returns.</li>
<li>However, the FASTA Definition Line <strong><em>must be separated</em></strong> from the actual sequence by a hard return.</li>
</ul>
<p>The placement of spaces and hard returns within a FASTA file is critical for the FASTA information and sequence(s) to be read correctly:
<img src="../defline_magnified.jpg"
alt="Format of a FASTA definition line showing placement of spaces and hard returns"/>
</p>
<dl>
<dt>Sample FASTA files showing Definition Lines and sequences</dt>
<dd><pre class="help">
<code>&gt;Seq1 [organism=Carpodacus mexicanus] [clone=6b] actin (act) mRNA, partial cds
CCTTTATCTAATCTTTGGAGCATGAGCTGGCATAGTTGGAACCGCCCTCAGCCTCCTCATCCGTGCAGAA
TAATAATTTTCTTTATAGTAATACCAATCATGATCGGTGGTTTCGGAAACTGACTAGTCCCACTCATAAT
<br/>
&gt;Seq2 [organism=uncultured bacillus sp.] [isolate=A2] corticotropin (CT) gene, complete cds
GGTAGGTACCGCCCTAAGNCTCCTAATCCGAGCAGAACTANGCCAACCCGGAGCCCTTCTGGGAGACGAC
TCAACACCACCTTCTTTGACCCAGCAGGAGGAGGAGACCCAGTACTATACCAGCACCTATTCTGATTCTT
<br/>
&gt;Seq3 [organism=Phalaenopsis equestris var. leucaspis]
CCTATACCTAATTTTCGGCGCATGAGCCGGAATGGTGGGTACCGCTCTAAGCCTCCTCATTCGAGCAGAA
CTAGGCCAACCCGGAGCCCTTCTGGGAGACGACCAAGTCTACAACGTGGTTGTCACGGCCCATGCCTTCG
<br/>
&gt;Seq9 [organism=Petunia integrifolia subsp. inflata]
TAGTTGGAACAGCCCTCAGCCTACTCATCCGAGCAGAACTAGGCCAACCCGGAACCCTCCTGGGAGATGA
CCAAATCTACAATGTAATCGTCACTGCCCATGCCTTCGTAATAATCTTCTTCATAGTAATACCAGTCATA
</code></pre></dd>
</dl>
<p><a href="sample_files/nucleotide-sample.txt">Sample nucleotide FASTA</a></p>
</div>
</body>
</html>