353 lines
19 KiB
HTML
353 lines
19 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
|
|
<html>
|
|
<head>
|
|
<title>BankIt Submission Help: Source Modifiers Table</title>
|
|
<link rel="stylesheet" href="../../css/bankit.13.17.css" type="text/css">
|
|
<link rel="stylesheet" type="text/css" href="../../css/sp_3_74_ncbi_header.13.17.css">
|
|
<link rel="stylesheet" type="text/css" href="../../css/sp_1_82_layout.13.17.css">
|
|
</head>
|
|
|
|
<body class="help">
|
|
<header id="ncbi_header" class="ncbi-header" role="banner">
|
|
<div class="usa-grid">
|
|
<div class="usa-width-one-whole">
|
|
<div class="ncbi-header__logo">
|
|
<a href="https://www.ncbi.nlm.nih.gov/" class="logo" aria-label="NCBI Logo"
|
|
data-ga-action="click_image" data-ga-label="NIH NLM Logo">
|
|
<img src="https://www.ncbi.nlm.nih.gov/coreutils/nwds/img/logos/AgencyLogo.svg" alt="NIH NLM Logo">
|
|
</a>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
</header>
|
|
<h1>Preparing a Source Modifiers Table File for All Source Modifiers</h1>
|
|
<div class="message warning">
|
|
<p><strong>Important updates:</strong></p>
|
|
<ul>
|
|
<li>
|
|
As <a href="https://ncbiinsights.ncbi.nlm.nih.gov/2023/05/01/sequences-genbank-sra/">previously announced</a>,
|
|
all GenBank sequence submissions <strong>require collection-date</strong> and <strong>geo_loc_name</strong>
|
|
starting December 2024.
|
|
</li>
|
|
<li>
|
|
Modifiers marked as <strong>deprecated</strong> in the list below will no longer be accepted as separate
|
|
modifiers in new submissions starting January 2025. They may still appear on older records and taxonomic
|
|
terms will still be shown in the BioSource organism name. This will not affect BioSample submissions.
|
|
</li>
|
|
</ul>
|
|
</div>
|
|
|
|
<div class="border1"><p>BankIt accepts source modifiers (<em>e.g.</em> specimen
|
|
voucher and isolate) in two ways, as a <strong>tab-delimited</strong> text
|
|
file containing a Source Modifiers Table (as described below) or by applying
|
|
the same source modifier value to all sequences in the submission using the
|
|
input form. Source modifiers can be changed by uploading new tables to overwrite a
|
|
previous table or by correcting or removing a previously input value in the
|
|
form. The current values of all source modifiers appear at the bottom of the page.</p>
|
|
|
|
<p>It is recommended for multiple sequences that you use only a table file
|
|
that contains all the source modifiers you want to add and that you do not add
|
|
source modifiers using both a table and the input value forms.</p>
|
|
</div>
|
|
|
|
<h2>Setting up the Source Modifiers Table</h2>
|
|
|
|
<div class="border1"><p>The Source Modifiers Table is a
|
|
<strong>tab-delimited</strong> text file of the source modifiers for all
|
|
specimens in a BankIt set.</p>
|
|
|
|
<p>The following modifiers must have only 'TRUE' as the value reported in a
|
|
source modifier table when they are used:
|
|
<ul>
|
|
<li>Germline</li>
|
|
<li>Metagenomic</li>
|
|
<li>Rearranged</li>
|
|
<li>Transgenic</li>
|
|
</ul>
|
|
</p>
|
|
|
|
<p>See below for an annotated list of <a href="#modifiers">source modifiers</a></p>
|
|
</div>
|
|
<h2>Contents of the Source Modifiers Table</h2>
|
|
|
|
<div class="border1"><p>The first row in the table contains the labels for each column. Each column in the table is a different source modifier. See below for the <a href="#modifiers">complete list of source modifiers</a>. </p>
|
|
|
|
<p>The first column contains the Sequence_IDs used to identify each sequence in the <a href="fasta.html">nucleotide FASTA file</a>.</p>
|
|
|
|
<p>Specimens are identified in the Source Modifiers Table by the same <a href="fasta.html#localid">Sequence_ID</a> used in the FASTA file.</p>
|
|
|
|
<p>The heading for the first column must be exactly <strong>Sequence_ID</strong> as shown in the sample below.</p>
|
|
|
|
<p>Each specimen in the set must have a line in the source modifiers file, even if there are no modifiers to apply to the specimen. </p>
|
|
|
|
<p>Each Sequence_ID may appear only once in the source modifier file.</p>
|
|
|
|
<p>Shown below are the contents of a <a href="sample_files/source-table-sample.txt">Sample Source Modifiers Table file</a>. Right-click on the link to save as a <strong>tab-delimited</strong> text file.</p>
|
|
|
|
<table class="example">
|
|
<tr>
|
|
<td nowrap> Sequence_ID </td>
|
|
<td nowrap> Collected_by </td>
|
|
<td nowrap> Collection_date </td>
|
|
<td nowrap> Country (geo_loc_name) </td>
|
|
<td nowrap> Isolation_source </td>
|
|
<td nowrap> Isolate </td>
|
|
<td nowrap> Lat_Lon </td>
|
|
<td nowrap> Specimen_voucher </td>
|
|
</tr>
|
|
<tr>
|
|
<td nowrap> Seq1 </td>
|
|
<td nowrap> C. Grant </td>
|
|
<td nowrap> 31-Jan-2001 </td>
|
|
<td nowrap> USA </td>
|
|
<td nowrap> soil </td>
|
|
<td nowrap> A </td>
|
|
<td nowrap> 13.57 N 24.68 W </td>
|
|
<td nowrap> MKP 334 </td>
|
|
</tr>
|
|
<tr>
|
|
<td nowrap> Seq2 </td>
|
|
<td nowrap> S. Tracy </td>
|
|
<td nowrap> 28-Feb-2002 </td>
|
|
<td nowrap> Slovakia </td>
|
|
<td nowrap> contaminated soil </td>
|
|
<td nowrap> B </td>
|
|
<td nowrap> 13.24 N 24.35 W </td>
|
|
<td nowrap> MKP 1230 </td>
|
|
</tr>
|
|
<tr>
|
|
<td nowrap> Seq3 </td>
|
|
<td nowrap> A. Gardner </td>
|
|
<td nowrap> 16-Apr-2001 </td>
|
|
<td nowrap> France </td>
|
|
<td nowrap> farm soil </td>
|
|
<td nowrap> C </td>
|
|
<td nowrap> 43.21 N 56.78 W </td>
|
|
<td nowrap> 1B-2526 </td>
|
|
</tr>
|
|
<tr>
|
|
<td nowrap> Seq4 </td>
|
|
<td nowrap> F. McMurray </td>
|
|
<td nowrap> 26-May-2002 </td>
|
|
<td nowrap> Germany </td>
|
|
<td nowrap> farm runoff water </td>
|
|
<td nowrap> D </td>
|
|
<td nowrap> 45.32 N 21.34 E </td>
|
|
<td nowrap> WBM 86-64 </td>
|
|
</tr>
|
|
<tr>
|
|
<td nowrap> Seq5 </td>
|
|
<td nowrap> V. Leigh </td>
|
|
<td nowrap> 13-Jun-2003 </td>
|
|
<td nowrap> Brazil </td>
|
|
<td nowrap> forest soil </td>
|
|
<td nowrap> E </td>
|
|
<td nowrap> 46.80 N 13.57 E </td>
|
|
<td nowrap> 1B-2518 </td>
|
|
</tr>
|
|
<tr>
|
|
<td nowrap> Seq6 </td>
|
|
<td nowrap> E. Flynn </td>
|
|
<td nowrap> 15-Aug-2000 </td>
|
|
<td nowrap> Australia </td>
|
|
<td nowrap> river water </td>
|
|
<td nowrap> F </td>
|
|
<td nowrap> 68.53 S 57.42 E </td>
|
|
<td nowrap> WBM 86-65 </td>
|
|
</tr>
|
|
<tr>
|
|
<td nowrap> Seq7 </td>
|
|
<td nowrap> G. Kelly </td>
|
|
<td nowrap> 26-Oct-2002 </td>
|
|
<td nowrap> Mexico </td>
|
|
<td nowrap> river bed soil </td>
|
|
<td nowrap> G </td>
|
|
<td nowrap> 22.44 S 55.77 W </td>
|
|
<td nowrap> 1B-2355 </td>
|
|
</tr>
|
|
</table>
|
|
|
|
</div>
|
|
<h2>Saving the Source Modifiers Table</h2>
|
|
|
|
<div class="border1"><p>When using a spreadsheet program,
|
|
be sure to save your file as <strong>tab-delimited</strong> text.
|
|
If you are not sure that the "Save" option in your program will do this for you, use "Save As..."</p>
|
|
|
|
<p>In Excel, select "Save As..." from the File menu. In the "Save as type:" pull-down menu, select "Text (Tab delimited) (*.txt)."</p>
|
|
</div>
|
|
<a name="modifiers"></a>
|
|
<h2>Source Modifiers</h2>
|
|
|
|
<div class="message warning">
|
|
<p><strong>Important updates:</strong></p>
|
|
<ul>
|
|
<li>
|
|
As <a href="https://ncbiinsights.ncbi.nlm.nih.gov/2023/05/01/sequences-genbank-sra/">previously announced</a>,
|
|
all GenBank sequence submissions <strong>require collection-date</strong> and <strong>geo_loc_name</strong>
|
|
starting December 2024.
|
|
</li>
|
|
<li>
|
|
Modifiers marked as <strong>deprecated</strong> in the list below will no longer be accepted as separate
|
|
modifiers in new submissions starting January 2025. They may still appear on older records and taxonomic
|
|
terms will still be shown in the BioSource organism name. This will not affect BioSample submissions.
|
|
</li>
|
|
</ul>
|
|
</div>
|
|
|
|
<div class="border1">
|
|
<h2>Commonly used Source Modifiers</h2>
|
|
<ul class="li-spacing">
|
|
<li><strong>Clone</strong> - Name of clone from which sequence was obtained, typically an alphanumeric ID.</li>
|
|
<li><strong><a href="collection-date.html">Collection_date</a></strong> - Date the specimen was collected.<br/>
|
|
In format <strong>DD-Mon-YYYY</strong>, that is 2-digit date, three-character abbreviation of month, and 4-digit year,
|
|
(<em>e.g.</em>, 11-Feb-2002).</br>
|
|
<strong>Mon-YYYY</strong> and <strong>YYYY</strong> are alternate formats to use when date information is less complete.</li>
|
|
<li><strong><a href="https://www.ncbi.nlm.nih.gov/genbank/collab/country/">Country (geo_loc_name)</a></strong> - Where the sequence's organism was
|
|
located. May be a country, an ocean, or major sea. Additional region or locality
|
|
information must be after the country, ocean, or major sea name and separated by a ':'. For
|
|
example: USA: Riverview Park, Ripkentown, MD</li>
|
|
<li><strong>Host</strong> - When the sequence submission is from an organism that exists in a symbiotic, parasitic, or other special relationship with some second organism, the 'host' modifier can be used to identify the name of the host species.</li>
|
|
<li><strong>Isolate</strong> - Individual isolate from which the sequence was obtained, typically an alphanumeric sample ID.</li>
|
|
<li><strong>Isolation source</strong> - Describes the physical, environmental and/or local geographical source of the biological sample from which the sequence was derived.</li>
|
|
<li><strong>Specimen_voucher</strong> - An identifier of the individual or collection of the source organism and the place where it is currently stored, usually an institution.
|
|
<p>This should be provided using the following format
|
|
'institution-code:collection-code:specimen-id'. specimen-id is mandatory,
|
|
collection-code is optional; institution-code is mandatory when collection-code
|
|
is provided. Examples:
|
|
<ul>
|
|
<li>99-SRNP</li>
|
|
<li>UAM:Mamm:52179</li>
|
|
<li>personal collection:Joe Smith:99-SRNP</li>
|
|
<li>AMCC:101706</li>
|
|
</ul>
|
|
</li>
|
|
<li><strong>Strain</strong> - Strain of organism from which sequence
|
|
was obtained. For microbial records, the strain is an alphanumeric
|
|
identifier that may be designated in any manner, for example, it may be
|
|
based on the name of an individual or locality. As an example, for
|
|
Escherichia coli K12, "K12" is the strain name/identifier.</li>
|
|
</ul>
|
|
</div>
|
|
<br/>
|
|
<div class="border1">
|
|
<p>The following source modifiers are available to further describe the
|
|
sequences in a submission:</p>
|
|
<ul class="li-spacing" style="margin-left:1em">
|
|
<li><strong>Altitude</strong> - Altitude in metres above or below sea level of where the sample was collected. </li>
|
|
<li class="deprecated greyout"><strong>Authority</strong> - deprecated - do
|
|
not use. <!--The author or authors of the organism name from which
|
|
sequence was obtained.--></li>
|
|
<li><strong>Bio_material</strong> - An identifier for the biological material from which the nucleotide sequence was obtained, with optional institution code and collection code for the place where it is currently stored.
|
|
<p> This should be provided using the following format <strong>'institution-code:collection-code:material_id'</strong>.
|
|
material_id is mandatory, institution-code and collection-code are optional; institution-code is mandatory when collection-code is present.
|
|
</p>
|
|
<p> This qualifier should be used to annotate the identifiers of material in biological collections which include zoos and aquaria, stock centers, seed banks, germplasm repositories and DNA banks.
|
|
</p>
|
|
</li>
|
|
<li class="deprecated greyout"><strong>Biotype</strong> - deprecated - do
|
|
not use. <!--Variety of a species (usually a fungus, bacteria, or virus)
|
|
characterized by some specific biological property (often geographical,
|
|
ecological, or physiological). Same as biotype.--></li>
|
|
<li class="deprecated greyout"><strong>Biovar</strong> - deprecated - do
|
|
not use. <!-- See biotype --></li>
|
|
<li><strong>Breed</strong> - The named breed from which sequence was obtained (usually applied to domesticated mammals).</li>
|
|
<li><strong>Cell_line</strong> - Cell line from which sequence was obtained.</li>
|
|
<li><strong>Cell_type</strong> - Type of cell from which sequence was obtained.</li>
|
|
<li class="deprecated greyout"><strong>Chemovar</strong> - deprecated - do
|
|
not use. <!-- Variety of a species (usually a fungus, bacteria, or
|
|
virus) characterized by its biochemical properties.--> </li>
|
|
<li><strong>Clone</strong> - Name of clone from which sequence was obtained.</li>
|
|
<li><strong>Collected_by</strong> - Name of person who collected the sample.</li>
|
|
<li><strong><a href="collection-date.html">Collection_date</a></strong> - Date the specimen was collected.
|
|
<br/>
|
|
In format <strong>DD-Mon-YYYY</strong>, that is 2-digit date, three-character abbreviation of month, and 4-digit year,
|
|
(<em>e.g.</em>, 11-Feb-2002). </br>
|
|
<strong>Mon-YYYY</strong> and <strong>YYYY</strong> are alternate formats to use when date information is less complete.</li>
|
|
<li><strong><a
|
|
href="https://www.ncbi.nlm.nih.gov/genbank/collab/country/">Country
|
|
(geo_loc_name)</a></strong> - Where the sequence's organism was
|
|
located. May be a country, an ocean or major sea. Additional region or locality
|
|
information must be after the country, ocean, or major sea name and separated by a ':'. For
|
|
example: USA: Riverview Park, Ripkentown, MD</li>
|
|
<li><strong>Cultivar</strong> - Cultivated variety of plant from which sequence was obtained. </li>
|
|
<li><strong>Culture_collection</strong> - Institution code and identifier for the culture from which the nucleotide sequence was obtained, with optional collection code.
|
|
<p>This should be provided using the following format
|
|
<strong>'institution-code:collection-code:culture-id'</strong>. culture-id and institution-code are mandatory.
|
|
</p>
|
|
<p> This qualifier should be used to annotate live microbial and viral cultures, and cell lines that have been deposited in curated culture collections.
|
|
</p>
|
|
</li>
|
|
<li><strong>Dev_stage</strong> - Developmental stage of organism.</li>
|
|
<li><strong>Ecotype</strong> - The named ecotype (population adapted to a local habitat) from which sequence was obtained (customarily applied to populations of Arabidopsis thaliana).</li>
|
|
<li class="deprecated greyout"><strong>Forma</strong> - deprecated - do not
|
|
use. <!--The forma (lowest taxonomic unit governed by the nomenclatural
|
|
codes) of organism from which sequence was obtained. This term is
|
|
usually applied to plants and fungi.--></li>
|
|
<li class="deprecated greyout"><strong>Forma_specialis</strong> - deprecated
|
|
- do not use. <!-- The physiologically distinct form from which sequence
|
|
was obtained (usually restricted to certain parasitic fungi).--></li>
|
|
<li><strong>Fwd_primer_name</strong> - name of forward PCR primer</li>
|
|
<li><strong>Fwd_primer_seq</strong> - nucleotide sequence of forward PCR primer</li>
|
|
<li><strong>Genotype</strong> - Genotype of the organism.</li>
|
|
<li><strong>Haplogroup</strong> - Name for a group of similar haplotypes that share some sequence variation</li>
|
|
<li><strong>Haplotype</strong> - Haplotype of the organism.</li>
|
|
<li><strong>Host</strong> - When the sequence submission is from an organism that exists in a symbiotic, parasitic, or other special relationship with some second organism, the 'host' modifier can be used to identify the name of the host species.</li>
|
|
<li class="deprecated greyout"><strong>Identified_by</strong> - deprecated -
|
|
do not use. <!--name of the person or persons who identified by
|
|
taxonomic name the organism from which the sequence was obtained.--> </li>
|
|
<li><strong>Isolate</strong> - Identification or description of the specific individual from which this sequence was obtained.</li>
|
|
<li><strong>Isolation source</strong> - Describes the local geographical source of the organism from which the sequence was obtained.</li>
|
|
<li><strong>Lab_host</strong> - Laboratory host used to propagate the organism from which the sequence was obtained.</li>
|
|
<li><strong>Lat_Lon</strong> - Latitude and longitude, in decimal degrees, of where the sample was collected. </li>
|
|
<li><strong>Note</strong> - Any additional information that you wish to provide about the sequence.</li>
|
|
<li class="deprecated greyout"><strong>Pathovar</strong> - deprecated - do
|
|
not use. <!-- Variety of a species (usually a fungus, bacteria or virus)
|
|
characterized by the biological target of the pathogen. Examples include
|
|
Pseudomonas syringae pathovar tomato and Pseudomonas syringae pathovar
|
|
tabaci.--></li>
|
|
<li class="deprecated greyout"><strong>Pop_variant</strong> - deprecated -
|
|
do not use. <!--name of the population variant from which the sequence
|
|
was obtained.--></li>
|
|
<li><strong>Rev_primer_name</strong> - name of reverse PCR primer</li>
|
|
<li><strong>Rev_primer_seq</strong> - nucleotide sequence of reverse PCR primer</li>
|
|
<li><strong>Segment</strong> - name of viral or phage segment sequenced</li>
|
|
<li class="deprecated greyout"><strong>Serogroup</strong> - deprecated - do
|
|
not use. <!--Variety of a species (usually a fungus, bacteria, or virus)
|
|
characterized by its antigenic properties. Same as serogroup and
|
|
serovar.--></li>
|
|
<li><strong>Serotype</strong> - <b>serological</b> variety of a species
|
|
characterized by its antigenic properties.</li>
|
|
<li><strong>Serovar</strong> - serological variety of a species (usually a prokaryote) characterized by its antigenic properties.</li>
|
|
<li><strong>Sex</strong> - Sex of the organism from which the sequence was obtained.</li>
|
|
<li><strong>Specimen_voucher</strong> - An identifier of the individual or collection of the source organism and the place where it is currently stored, usually an institution.
|
|
<p>This should be provided using the following format
|
|
'institution-code:collection-code:specimen-id'. specimen-id is mandatory,
|
|
collection-code is optional; institution-code is mandatory when collection-code
|
|
is provided. Examples:
|
|
<ul>
|
|
<li>99-SRNP</li>
|
|
<li>UAM:Mamm:52179</li>
|
|
<li>personal collection:Joe Smith:99-SRNP</li>
|
|
<li>AMCC:101706</li>
|
|
</ul>
|
|
</li>
|
|
<li><strong>Strain</strong> - Strain of organism from which sequence was obtained.</li>
|
|
<li><strong>Sub_species</strong> - Subspecies of organism from which sequence was obtained.</li>
|
|
<li class="deprecated greyout"><strong>Subclone</strong> - deprecated - do
|
|
not use. <!-- Name of subclone from which sequence was obtained. --></li>
|
|
<li class="deprecated greyout"><strong>Subtype</strong> - deprecated - do
|
|
not use. <!-- Subtype of organism from which sequence was obtained. --></li>
|
|
<li class="deprecated greyout"><strong>Substrain</strong> - deprecated - do
|
|
not use. <!-- Sub-strain of organism from which sequence was
|
|
obtained.--></li>
|
|
<li><strong>Tissue_lib</strong> - Tissue library from which the sequence was obtained.</li>
|
|
<li><strong>Tissue_type</strong> - Type of tissue from which sequence was obtained.</li>
|
|
<li class="deprecated greyout"><strong>Type</strong> - deprecated - do not
|
|
use. <!-- Type of organism from which sequence was obtained.--></li>
|
|
<li><strong>Variety</strong> - Variety of organism from which sequence was obtained.</li>
|
|
</ul>
|
|
</div>
|
|
</body>
|
|
</html>
|