nih-gov/www.ncbi.nlm.nih.gov/WebSub/html/help/fasta.html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
    <title>BankIt Submission Help: Nucleotide FASTA file</title>
    <link rel="stylesheet" href="../../css/bankit.13.6.css"  type="text/css">
    <link rel="stylesheet" type="text/css" href="../../css/sp_3_74_ncbi_header.13.6.css">
    <link rel="stylesheet" type="text/css" href="../../css/sp_1_82_layout.13.6.css">
</head>

<body class="help">
    <header id="ncbi_header" class="ncbi-header" role="banner">
        <div class="usa-grid">
            <div class="usa-width-one-whole">
                <div class="ncbi-header__logo">
                    <a href="https://www.ncbi.nlm.nih.gov/" class="logo" aria-label="NCBI Logo"
                       data-ga-action="click_image" data-ga-label="NIH NLM Logo">
                       <img src="https://www.ncbi.nlm.nih.gov/coreutils/nwds/img/logos/AgencyLogo.svg" alt="NIH NLM Logo">
                    </a>
                </div>
            </div>
        </div>
    </header>

<h1>BankIt Submission Help: Nucleotide FASTA file</h1>
<h2>Use Plain Text Format:</h2>
<div class="border1">
<ul>
<li>Use a text editor (for example, WordPad) to prepare the FASTA file of nucleotide sequences.</li>
<li>Be sure to save your file as Plain Text or Text document.</li>
<li>If you are not sure that the "Save" option in your program does this automatically, use "Save As...". In the "Save as type:" pull-down menu, select "Text Document"</li>
<li>If using Word, select "Save As.." from the File menu. In the "Save as type:" pull-down menu, select "Plain Text(*.txt)."</li>
<li>Do <strong>not</strong> save the file as .doc or .rtf (rich text format); BankIt will not allow you to upload a non-plain text file</li>
</ul>

</div>

<h2>Content Rules:</h2>

<div class="border1">
<ul>
<li>Each sequence in the FASTA file contains a Definition Line followed by the sequence data.</li>
<li>The Definition Line for each sequence begins with a "&gt;" followed by a Sequence_ID (SeqID). The SeqID identifies the same specimen in all the steps of a submission (for example, in the nucleotide FASTA file, in a protein FASTA file, or in a Source Modifier file).</li>
<li>Each SeqID must be unique within the file</li>
<li>SeqIDs <strong><em>may not contain spaces</em></strong>.</li>
<li>SeqIDs may contain only the following characters - letters, digits, hyphens (-), underscores (_), periods (.), colons (:), asterisks (*), and number signs (#)</li>
<li>SeqIDs must be 25 or fewer characters.</li>
<li>The SeqID must be separated by a space from the rest of the Definition Line text</li>
<li>It is recommended that the Definition Line include the organism name. If Organism Names are not input as part of their FASTA Definition Lines, they must be provided in a separate table in a subsequent page of the submission process.</li>
<li>The Organism Name must be provided in this format: <code><strong>[organism=Organism Name]</strong></code> (square bracket equal sign Organism Name square bracket).</li>
<li><a href="genbank-source-table.html#modifiers">Source Modifiers</a> provided in the FASTA file Definition Line must follow the same format as Organism Name.  Examples: [isolate=mosquito12] [clone=AC3] [strain=BuzzLY]</li>
<li>Brief, free text description of the sequence may follow the formatted
    Organism Name and Source Modifiers.  Examples: 'cytochrome oxidase I, partial CDS'  'trnH-psbA intergenic spacer'</li>
<li>The FASTA Definition Line <strong><em>may not contain</em></strong> any internal hard returns.</li>
<li>However, the FASTA Definition Line <strong><em>must be separated</em></strong> from the actual sequence by a hard return.</li>
</ul>

<p>The placement of spaces and hard returns within a FASTA file is critical for the FASTA information and sequence(s) to be read correctly:
    <img src="../defline_magnified.jpg"
         alt="Format of a FASTA definition line showing placement of spaces and hard returns"/>
</p>

<dl>
<dt>Sample FASTA files showing Definition Lines and sequences</dt>
<dd><pre class="help">
<code>&gt;Seq1 [organism=Carpodacus mexicanus] [clone=6b] actin (act) mRNA, partial cds
CCTTTATCTAATCTTTGGAGCATGAGCTGGCATAGTTGGAACCGCCCTCAGCCTCCTCATCCGTGCAGAA
TAATAATTTTCTTTATAGTAATACCAATCATGATCGGTGGTTTCGGAAACTGACTAGTCCCACTCATAAT
<br/>
&gt;Seq2 [organism=uncultured bacillus sp.] [isolate=A2] corticotropin (CT) gene, complete cds
GGTAGGTACCGCCCTAAGNCTCCTAATCCGAGCAGAACTANGCCAACCCGGAGCCCTTCTGGGAGACGAC
TCAACACCACCTTCTTTGACCCAGCAGGAGGAGGAGACCCAGTACTATACCAGCACCTATTCTGATTCTT
<br/>
&gt;Seq3 [organism=Phalaenopsis equestris var. leucaspis]
CCTATACCTAATTTTCGGCGCATGAGCCGGAATGGTGGGTACCGCTCTAAGCCTCCTCATTCGAGCAGAA
CTAGGCCAACCCGGAGCCCTTCTGGGAGACGACCAAGTCTACAACGTGGTTGTCACGGCCCATGCCTTCG
<br/>
&gt;Seq9 [organism=Petunia integrifolia subsp. inflata]
TAGTTGGAACAGCCCTCAGCCTACTCATCCGAGCAGAACTAGGCCAACCCGGAACCCTCCTGGGAGATGA
CCAAATCTACAATGTAATCGTCACTGCCCATGCCTTCGTAATAATCTTCTTCATAGTAATACCAGTCATA
</code></pre></dd>
</dl>

<p><a href="sample_files/nucleotide-sample.txt">Sample nucleotide FASTA</a></p>

</div>

</body>
</html>