nih-gov/www.ncbi.nlm.nih.gov/WebSub/html/help/genbank-source-table.html

353 lines
19 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>BankIt Submission Help: Source Modifiers Table</title>
<link rel="stylesheet" href="../../css/bankit.13.17.css" type="text/css">
<link rel="stylesheet" type="text/css" href="../../css/sp_3_74_ncbi_header.13.17.css">
<link rel="stylesheet" type="text/css" href="../../css/sp_1_82_layout.13.17.css">
</head>
<body class="help">
<header id="ncbi_header" class="ncbi-header" role="banner">
<div class="usa-grid">
<div class="usa-width-one-whole">
<div class="ncbi-header__logo">
<a href="https://www.ncbi.nlm.nih.gov/" class="logo" aria-label="NCBI Logo"
data-ga-action="click_image" data-ga-label="NIH NLM Logo">
<img src="https://www.ncbi.nlm.nih.gov/coreutils/nwds/img/logos/AgencyLogo.svg" alt="NIH NLM Logo">
</a>
</div>
</div>
</div>
</header>
<h1>Preparing a Source Modifiers Table File for All Source Modifiers</h1>
<div class="message warning">
<p><strong>Important updates:</strong></p>
<ul>
<li>
As <a href="https://ncbiinsights.ncbi.nlm.nih.gov/2023/05/01/sequences-genbank-sra/">previously announced</a>,
all GenBank sequence submissions <strong>require collection-date</strong> and <strong>geo_loc_name</strong>
starting December 2024.
</li>
<li>
Modifiers marked as <strong>deprecated</strong> in the list below will no longer be accepted as separate
modifiers in new submissions starting January 2025. They may still appear on older records and taxonomic
terms will still be shown in the BioSource organism name. This will not affect BioSample submissions.
</li>
</ul>
</div>
<div class="border1"><p>BankIt accepts source modifiers (<em>e.g.</em> specimen
voucher and isolate) in two ways, as a <strong>tab-delimited</strong> text
file containing a Source Modifiers Table (as described below) or by applying
the same source modifier value to all sequences in the submission using the
input form. Source modifiers can be changed by uploading new tables to overwrite a
previous table or by correcting or removing a previously input value in the
form. The current values of all source modifiers appear at the bottom of the page.</p>
<p>It is recommended for multiple sequences that you use only a table file
that contains all the source modifiers you want to add and that you do not add
source modifiers using both a table and the input value forms.</p>
</div>
<h2>Setting up the Source Modifiers Table</h2>
<div class="border1"><p>The Source Modifiers Table is a
<strong>tab-delimited</strong> text file of the source modifiers for all
specimens in a BankIt set.</p>
<p>The following modifiers must have only 'TRUE' as the value reported in a
source modifier table when they are used:
<ul>
<li>Germline</li>
<li>Metagenomic</li>
<li>Rearranged</li>
<li>Transgenic</li>
</ul>
</p>
<p>See below for an annotated list of <a href="#modifiers">source modifiers</a></p>
</div>
<h2>Contents of the Source Modifiers Table</h2>
<div class="border1"><p>The first row in the table contains the labels for each column. Each column in the table is a different source modifier. See below for the <a href="#modifiers">complete list of source modifiers</a>. </p>
<p>The first column contains the Sequence_IDs used to identify each sequence in the <a href="fasta.html">nucleotide FASTA file</a>.</p>
<p>Specimens are identified in the Source Modifiers Table by the same <a href="fasta.html#localid">Sequence_ID</a> used in the FASTA file.</p>
<p>The heading for the first column must be exactly <strong>Sequence_ID</strong> as shown in the sample below.</p>
<p>Each specimen in the set must have a line in the source modifiers file, even if there are no modifiers to apply to the specimen. </p>
<p>Each Sequence_ID may appear only once in the source modifier file.</p>
<p>Shown below are the contents of a <a href="sample_files/source-table-sample.txt">Sample Source Modifiers Table file</a>. Right-click on the link to save as a <strong>tab-delimited</strong> text file.</p>
<table class="example">
<tr>
<td nowrap> Sequence_ID </td>
<td nowrap> Collected_by </td>
<td nowrap> Collection_date </td>
<td nowrap> Country (geo_loc_name) </td>
<td nowrap> Isolation_source </td>
<td nowrap> Isolate </td>
<td nowrap> Lat_Lon </td>
<td nowrap> Specimen_voucher </td>
</tr>
<tr>
<td nowrap> Seq1 </td>
<td nowrap> C. Grant </td>
<td nowrap> 31-Jan-2001 </td>
<td nowrap> USA </td>
<td nowrap> soil </td>
<td nowrap> A </td>
<td nowrap> 13.57 N 24.68 W </td>
<td nowrap> MKP 334 </td>
</tr>
<tr>
<td nowrap> Seq2 </td>
<td nowrap> S. Tracy </td>
<td nowrap> 28-Feb-2002 </td>
<td nowrap> Slovakia </td>
<td nowrap> contaminated soil </td>
<td nowrap> B </td>
<td nowrap> 13.24 N 24.35 W </td>
<td nowrap> MKP 1230 </td>
</tr>
<tr>
<td nowrap> Seq3 </td>
<td nowrap> A. Gardner </td>
<td nowrap> 16-Apr-2001 </td>
<td nowrap> France </td>
<td nowrap> farm soil </td>
<td nowrap> C </td>
<td nowrap> 43.21 N 56.78 W </td>
<td nowrap> 1B-2526 </td>
</tr>
<tr>
<td nowrap> Seq4 </td>
<td nowrap> F. McMurray </td>
<td nowrap> 26-May-2002 </td>
<td nowrap> Germany </td>
<td nowrap> farm runoff water </td>
<td nowrap> D </td>
<td nowrap> 45.32 N 21.34 E </td>
<td nowrap> WBM 86-64 </td>
</tr>
<tr>
<td nowrap> Seq5 </td>
<td nowrap> V. Leigh </td>
<td nowrap> 13-Jun-2003 </td>
<td nowrap> Brazil </td>
<td nowrap> forest soil </td>
<td nowrap> E </td>
<td nowrap> 46.80 N 13.57 E </td>
<td nowrap> 1B-2518 </td>
</tr>
<tr>
<td nowrap> Seq6 </td>
<td nowrap> E. Flynn </td>
<td nowrap> 15-Aug-2000 </td>
<td nowrap> Australia </td>
<td nowrap> river water </td>
<td nowrap> F </td>
<td nowrap> 68.53 S 57.42 E </td>
<td nowrap> WBM 86-65 </td>
</tr>
<tr>
<td nowrap> Seq7 </td>
<td nowrap> G. Kelly </td>
<td nowrap> 26-Oct-2002 </td>
<td nowrap> Mexico </td>
<td nowrap> river bed soil </td>
<td nowrap> G </td>
<td nowrap> 22.44 S 55.77 W </td>
<td nowrap> 1B-2355 </td>
</tr>
</table>
</div>
<h2>Saving the Source Modifiers Table</h2>
<div class="border1"><p>When using a spreadsheet program,
be sure to save your file as <strong>tab-delimited</strong> text.
If you are not sure that the "Save" option in your program will do this for you, use "Save As..."</p>
<p>In Excel, select "Save As..." from the File menu. In the "Save as type:" pull-down menu, select "Text (Tab delimited) (*.txt)."</p>
</div>
<a name="modifiers"></a>
<h2>Source Modifiers</h2>
<div class="message warning">
<p><strong>Important updates:</strong></p>
<ul>
<li>
As <a href="https://ncbiinsights.ncbi.nlm.nih.gov/2023/05/01/sequences-genbank-sra/">previously announced</a>,
all GenBank sequence submissions <strong>require collection-date</strong> and <strong>geo_loc_name</strong>
starting December 2024.
</li>
<li>
Modifiers marked as <strong>deprecated</strong> in the list below will no longer be accepted as separate
modifiers in new submissions starting January 2025. They may still appear on older records and taxonomic
terms will still be shown in the BioSource organism name. This will not affect BioSample submissions.
</li>
</ul>
</div>
<div class="border1">
<h2>Commonly used Source Modifiers</h2>
<ul class="li-spacing">
<li><strong>Clone</strong> - Name of clone from which sequence was obtained, typically an alphanumeric ID.</li>
<li><strong><a href="collection-date.html">Collection_date</a></strong> - Date the specimen was collected.<br/>
In format <strong>DD-Mon-YYYY</strong>, that is 2-digit date, three-character abbreviation of month, and 4-digit year,
(<em>e.g.</em>, 11-Feb-2002).</br>
<strong>Mon-YYYY</strong> and <strong>YYYY</strong> are alternate formats to use when date information is less complete.</li>
<li><strong><a href="https://www.ncbi.nlm.nih.gov/genbank/collab/country/">Country (geo_loc_name)</a></strong> - Where the sequence's organism was
located. May be a country, an ocean, or major sea. Additional region or locality
information must be after the country, ocean, or major sea name and separated by a ':'. For
example: USA: Riverview Park, Ripkentown, MD</li>
<li><strong>Host</strong> - When the sequence submission is from an organism that exists in a symbiotic, parasitic, or other special relationship with some second organism, the 'host' modifier can be used to identify the name of the host species.</li>
<li><strong>Isolate</strong> - Individual isolate from which the sequence was obtained, typically an alphanumeric sample ID.</li>
<li><strong>Isolation source</strong> - Describes the physical, environmental and/or local geographical source of the biological sample from which the sequence was derived.</li>
<li><strong>Specimen_voucher</strong> - An identifier of the individual or collection of the source organism and the place where it is currently stored, usually an institution.
<p>This should be provided using the following format
'institution-code:collection-code:specimen-id'. specimen-id is mandatory,
collection-code is optional; institution-code is mandatory when collection-code
is provided. Examples:
<ul>
<li>99-SRNP</li>
<li>UAM:Mamm:52179</li>
<li>personal collection:Joe Smith:99-SRNP</li>
<li>AMCC:101706</li>
</ul>
</li>
<li><strong>Strain</strong> - Strain of organism from which sequence
was obtained. For microbial records, the strain is an alphanumeric
identifier that may be designated in any manner, for example, it may be
based on the name of an individual or locality. As an example, for
Escherichia coli K12, "K12" is the strain name/identifier.</li>
</ul>
</div>
<br/>
<div class="border1">
<p>The following source modifiers are available to further describe the
sequences in a submission:</p>
<ul class="li-spacing" style="margin-left:1em">
<li><strong>Altitude</strong> - Altitude in metres above or below sea level of where the sample was collected. </li>
<li class="deprecated greyout"><strong>Authority</strong> - deprecated - do
not use. <!--The author or authors of the organism name from which
sequence was obtained.--></li>
<li><strong>Bio_material</strong> - An identifier for the biological material from which the nucleotide sequence was obtained, with optional institution code and collection code for the place where it is currently stored.
<p> This should be provided using the following format <strong>'institution-code:collection-code:material_id'</strong>.
material_id is mandatory, institution-code and collection-code are optional; institution-code is mandatory when collection-code is present.
</p>
<p> This qualifier should be used to annotate the identifiers of material in biological collections which include zoos and aquaria, stock centers, seed banks, germplasm repositories and DNA banks.
</p>
</li>
<li class="deprecated greyout"><strong>Biotype</strong> - deprecated - do
not use. <!--Variety of a species (usually a fungus, bacteria, or virus)
characterized by some specific biological property (often geographical,
ecological, or physiological). Same as biotype.--></li>
<li class="deprecated greyout"><strong>Biovar</strong> - deprecated - do
not use. <!-- See biotype --></li>
<li><strong>Breed</strong> - The named breed from which sequence was obtained (usually applied to domesticated mammals).</li>
<li><strong>Cell_line</strong> - Cell line from which sequence was obtained.</li>
<li><strong>Cell_type</strong> - Type of cell from which sequence was obtained.</li>
<li class="deprecated greyout"><strong>Chemovar</strong> - deprecated - do
not use. <!-- Variety of a species (usually a fungus, bacteria, or
virus) characterized by its biochemical properties.--> </li>
<li><strong>Clone</strong> - Name of clone from which sequence was obtained.</li>
<li><strong>Collected_by</strong> - Name of person who collected the sample.</li>
<li><strong><a href="collection-date.html">Collection_date</a></strong> - Date the specimen was collected.
<br/>
In format <strong>DD-Mon-YYYY</strong>, that is 2-digit date, three-character abbreviation of month, and 4-digit year,
(<em>e.g.</em>, 11-Feb-2002). </br>
<strong>Mon-YYYY</strong> and <strong>YYYY</strong> are alternate formats to use when date information is less complete.</li>
<li><strong><a
href="https://www.ncbi.nlm.nih.gov/genbank/collab/country/">Country
(geo_loc_name)</a></strong> - Where the sequence's organism was
located. May be a country, an ocean or major sea. Additional region or locality
information must be after the country, ocean, or major sea name and separated by a ':'. For
example: USA: Riverview Park, Ripkentown, MD</li>
<li><strong>Cultivar</strong> - Cultivated variety of plant from which sequence was obtained. </li>
<li><strong>Culture_collection</strong> - Institution code and identifier for the culture from which the nucleotide sequence was obtained, with optional collection code.
<p>This should be provided using the following format
<strong>'institution-code:collection-code:culture-id'</strong>. culture-id and institution-code are mandatory.
</p>
<p> This qualifier should be used to annotate live microbial and viral cultures, and cell lines that have been deposited in curated culture collections.
</p>
</li>
<li><strong>Dev_stage</strong> - Developmental stage of organism.</li>
<li><strong>Ecotype</strong> - The named ecotype (population adapted to a local habitat) from which sequence was obtained (customarily applied to populations of Arabidopsis thaliana).</li>
<li class="deprecated greyout"><strong>Forma</strong> - deprecated - do not
use. <!--The forma (lowest taxonomic unit governed by the nomenclatural
codes) of organism from which sequence was obtained. This term is
usually applied to plants and fungi.--></li>
<li class="deprecated greyout"><strong>Forma_specialis</strong> - deprecated
- do not use. <!-- The physiologically distinct form from which sequence
was obtained (usually restricted to certain parasitic fungi).--></li>
<li><strong>Fwd_primer_name</strong> - name of forward PCR primer</li>
<li><strong>Fwd_primer_seq</strong> - nucleotide sequence of forward PCR primer</li>
<li><strong>Genotype</strong> - Genotype of the organism.</li>
<li><strong>Haplogroup</strong> - Name for a group of similar haplotypes that share some sequence variation</li>
<li><strong>Haplotype</strong> - Haplotype of the organism.</li>
<li><strong>Host</strong> - When the sequence submission is from an organism that exists in a symbiotic, parasitic, or other special relationship with some second organism, the 'host' modifier can be used to identify the name of the host species.</li>
<li class="deprecated greyout"><strong>Identified_by</strong> - deprecated -
do not use. <!--name of the person or persons who identified by
taxonomic name the organism from which the sequence was obtained.--> </li>
<li><strong>Isolate</strong> - Identification or description of the specific individual from which this sequence was obtained.</li>
<li><strong>Isolation source</strong> - Describes the local geographical source of the organism from which the sequence was obtained.</li>
<li><strong>Lab_host</strong> - Laboratory host used to propagate the organism from which the sequence was obtained.</li>
<li><strong>Lat_Lon</strong> - Latitude and longitude, in decimal degrees, of where the sample was collected. </li>
<li><strong>Note</strong> - Any additional information that you wish to provide about the sequence.</li>
<li class="deprecated greyout"><strong>Pathovar</strong> - deprecated - do
not use. <!-- Variety of a species (usually a fungus, bacteria or virus)
characterized by the biological target of the pathogen. Examples include
Pseudomonas syringae pathovar tomato and Pseudomonas syringae pathovar
tabaci.--></li>
<li class="deprecated greyout"><strong>Pop_variant</strong> - deprecated -
do not use. <!--name of the population variant from which the sequence
was obtained.--></li>
<li><strong>Rev_primer_name</strong> - name of reverse PCR primer</li>
<li><strong>Rev_primer_seq</strong> - nucleotide sequence of reverse PCR primer</li>
<li><strong>Segment</strong> - name of viral or phage segment sequenced</li>
<li class="deprecated greyout"><strong>Serogroup</strong> - deprecated - do
not use. <!--Variety of a species (usually a fungus, bacteria, or virus)
characterized by its antigenic properties. Same as serogroup and
serovar.--></li>
<li><strong>Serotype</strong> - <b>serological</b> variety of a species
characterized by its antigenic properties.</li>
<li><strong>Serovar</strong> - serological variety of a species (usually a prokaryote) characterized by its antigenic properties.</li>
<li><strong>Sex</strong> - Sex of the organism from which the sequence was obtained.</li>
<li><strong>Specimen_voucher</strong> - An identifier of the individual or collection of the source organism and the place where it is currently stored, usually an institution.
<p>This should be provided using the following format
'institution-code:collection-code:specimen-id'. specimen-id is mandatory,
collection-code is optional; institution-code is mandatory when collection-code
is provided. Examples:
<ul>
<li>99-SRNP</li>
<li>UAM:Mamm:52179</li>
<li>personal collection:Joe Smith:99-SRNP</li>
<li>AMCC:101706</li>
</ul>
</li>
<li><strong>Strain</strong> - Strain of organism from which sequence was obtained.</li>
<li><strong>Sub_species</strong> - Subspecies of organism from which sequence was obtained.</li>
<li class="deprecated greyout"><strong>Subclone</strong> - deprecated - do
not use. <!-- Name of subclone from which sequence was obtained. --></li>
<li class="deprecated greyout"><strong>Subtype</strong> - deprecated - do
not use. <!-- Subtype of organism from which sequence was obtained. --></li>
<li class="deprecated greyout"><strong>Substrain</strong> - deprecated - do
not use. <!-- Sub-strain of organism from which sequence was
obtained.--></li>
<li><strong>Tissue_lib</strong> - Tissue library from which the sequence was obtained.</li>
<li><strong>Tissue_type</strong> - Type of tissue from which sequence was obtained.</li>
<li class="deprecated greyout"><strong>Type</strong> - deprecated - do not
use. <!-- Type of organism from which sequence was obtained.--></li>
<li><strong>Variety</strong> - Variety of organism from which sequence was obtained.</li>
</ul>
</div>
</body>
</html>