64 lines
2.9 KiB
HTML
64 lines
2.9 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
|
<html>
|
|
<head>
|
|
<title>BankIt Submission Help: Protein FASTA</title>
|
|
<link rel="stylesheet" href="../../css/bankit.13.6.css" type="text/css">
|
|
<link rel="stylesheet" type="text/css" href="../../css/sp_3_74_ncbi_header.13.6.css">
|
|
<link rel="stylesheet" type="text/css" href="../../css/sp_1_82_layout.13.6.css">
|
|
</head>
|
|
<body class="help">
|
|
<header id="ncbi_header" class="ncbi-header" role="banner">
|
|
<div class="usa-grid">
|
|
<div class="usa-width-one-whole">
|
|
<div class="ncbi-header__logo">
|
|
<a href="https://www.ncbi.nlm.nih.gov/" class="logo" aria-label="NCBI Logo"
|
|
data-ga-action="click_image" data-ga-label="NIH NLM Logo">
|
|
<img src="https://www.ncbi.nlm.nih.gov/coreutils/nwds/img/logos/AgencyLogo.svg"
|
|
alt="NIH NLM Logo">
|
|
</a>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
</header>
|
|
|
|
<h1>BankIt Submission Help: Protein FASTA</h1>
|
|
|
|
<div class="border1">
|
|
<p>The format of the protein FASTA file is similar to the format of the <a href="fasta.html">nucleotide FASTA file</a>.</p>
|
|
|
|
<p>Like the nucleotide FASTA file, the protein FASTA file contains a SequenceID followed by the data for the sequence but it does not include organism name or any other source modifiers.</p>
|
|
|
|
<p>For the protein FASTA definition line, <strong>start with a > followed by the Sequence_ID of the nucleotide sequence that translates to the protein sequence.</strong></p>
|
|
|
|
<p>Use the same Sequence_ID for the protein FASTA you used for its corresponding sequence in the nucleotide FASTA file.</p>
|
|
|
|
<p>There must NOT be a space between the > and the Sequence_ID</p>
|
|
<p>There must be a hard return between the >SequenceID and the actual protein sequence.</p>
|
|
<img src="../prot_defline_magnified.jpg" alt="Format of a protein FASTA definition line showing placement of spaces and hard returns"></img>
|
|
|
|
<p>Correct IUPAC codes for amino acids can be found in the <a href="https://www.ncbi.nlm.nih.gov/books/NBK53702/#gbankquickstart.what_are_the_iupac_cod_2">GenBank Submissions Handbook</a></p>
|
|
|
|
<dl><dt>Sample Protein FASTA</dt>
|
|
|
|
<dd><pre class="help"><code>>Seq1
|
|
LYLIFGAWAGMVGTALSLLIRAELGQPGTLLGDDQIYNVIVTAHAFVMIFFMVMPIMIGGFGNWLVPLMI
|
|
GAPDMAFPRMNNMSFWLLPPSFLLLLASSTVEAGAGTGWTVYPPLAGNLAHAGASVDLAIFSLHLAGVSS
|
|
ILGAINFITTAINMKPPTLSQYQTPLFVWSVLITAVLLLLSLPVLAAGITMLLTDRNLNTTFFDPAGGGD
|
|
PVLYQHLFWFFGHPEVYILIL
|
|
<br/>
|
|
>Seq2
|
|
VGTALXLLIRAELXQPGALLGDDQIYNVVVTAHAFVMIFFMVMPIMIGGFGNWLVPLMIGAPDMAFPRMN
|
|
NMSFWLLPPSFLLLMASSTVEAGAGTGWTVYPPLAGNLAHAGASVDLAIFSLHLAGISSILGAINFITTA
|
|
INMKPPALSQYQTPLFVWSVLITAVLLLLSLPVLAAGITMLLTDRNLNTTFFDPAGGGDPVLYQHLFWFF
|
|
GHPEVYILIL
|
|
</code>
|
|
</pre>
|
|
</dd>
|
|
|
|
<dt>Sample Protein FASTA File</dt>
|
|
<dd><a href="sample_files/protein-sample.txt">sample file</a></dd>
|
|
</dl>
|
|
<p>For barcode submissions, one has the option of providing a file of protein sequences in FASTA format. This protein FASTA file is not required for Barcode submissions.</p>
|
|
</div>
|
|
</body>
|
|
</html>
|