nih-gov/www.ncbi.nlm.nih.gov/WebSub/html/help/protein.html

64 lines
2.9 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>BankIt Submission Help: Protein FASTA</title>
<link rel="stylesheet" href="../../css/bankit.13.6.css" type="text/css">
<link rel="stylesheet" type="text/css" href="../../css/sp_3_74_ncbi_header.13.6.css">
<link rel="stylesheet" type="text/css" href="../../css/sp_1_82_layout.13.6.css">
</head>
<body class="help">
<header id="ncbi_header" class="ncbi-header" role="banner">
<div class="usa-grid">
<div class="usa-width-one-whole">
<div class="ncbi-header__logo">
<a href="https://www.ncbi.nlm.nih.gov/" class="logo" aria-label="NCBI Logo"
data-ga-action="click_image" data-ga-label="NIH NLM Logo">
<img src="https://www.ncbi.nlm.nih.gov/coreutils/nwds/img/logos/AgencyLogo.svg"
alt="NIH NLM Logo">
</a>
</div>
</div>
</div>
</header>
<h1>BankIt Submission Help: Protein FASTA</h1>
<div class="border1">
<p>The format of the protein FASTA file is similar to the format of the <a href="fasta.html">nucleotide FASTA file</a>.</p>
<p>Like the nucleotide FASTA file, the protein FASTA file contains a SequenceID followed by the data for the sequence but it does not include organism name or any other source modifiers.</p>
<p>For the protein FASTA definition line, <strong>start with a &gt; followed by the Sequence_ID of the nucleotide sequence that translates to the protein sequence.</strong></p>
<p>Use the same Sequence_ID for the protein FASTA you used for its corresponding sequence in the nucleotide FASTA file.</p>
<p>There must NOT be a space between the &gt; and the Sequence_ID</p>
<p>There must be a hard return between the &gt;SequenceID and the actual protein sequence.</p>
<img src="../prot_defline_magnified.jpg" alt="Format of a protein FASTA definition line showing placement of spaces and hard returns"></img>
<p>Correct IUPAC codes for amino acids can be found in the <a href="https://www.ncbi.nlm.nih.gov/books/NBK53702/#gbankquickstart.what_are_the_iupac_cod_2">GenBank Submissions Handbook</a></p>
<dl><dt>Sample Protein FASTA</dt>
<dd><pre class="help"><code>&gt;Seq1
LYLIFGAWAGMVGTALSLLIRAELGQPGTLLGDDQIYNVIVTAHAFVMIFFMVMPIMIGGFGNWLVPLMI
GAPDMAFPRMNNMSFWLLPPSFLLLLASSTVEAGAGTGWTVYPPLAGNLAHAGASVDLAIFSLHLAGVSS
ILGAINFITTAINMKPPTLSQYQTPLFVWSVLITAVLLLLSLPVLAAGITMLLTDRNLNTTFFDPAGGGD
PVLYQHLFWFFGHPEVYILIL
<br/>
&gt;Seq2
VGTALXLLIRAELXQPGALLGDDQIYNVVVTAHAFVMIFFMVMPIMIGGFGNWLVPLMIGAPDMAFPRMN
NMSFWLLPPSFLLLMASSTVEAGAGTGWTVYPPLAGNLAHAGASVDLAIFSLHLAGISSILGAINFITTA
INMKPPALSQYQTPLFVWSVLITAVLLLLSLPVLAAGITMLLTDRNLNTTFFDPAGGGDPVLYQHLFWFF
GHPEVYILIL
</code>
</pre>
</dd>
<dt>Sample Protein FASTA File</dt>
<dd><a href="sample_files/protein-sample.txt">sample file</a></dd>
</dl>
<p>For barcode submissions, one has the option of providing a file of protein sequences in FASTA format. This protein FASTA file is not required for Barcode submissions.</p>
</div>
</body>
</html>