nih-gov/www.ncbi.nlm.nih.gov/geo/info/spreadsheet.html

299 lines
18 KiB
HTML

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>GEOarchive submission instructions - GEO - NCBI</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="author" content="geo" />
<meta name="keywords" content="NCBI, national institutes of health, nih, database, archive, central, bioinformatics, biomedicine, geo, gene, expression, omnibus, chips, microarrays, oligonucleotide, array, sage, CGH" />
<meta name="description" content="Gene Expression Omnibus (GEO) is a database repository of high throughput gene expression data and hybridization arrays, chips, microarrays." />
<meta name="ncbiaccordion" content="collapsible: true, active: false" />
<meta name="ncbi_app" content="geo" />
<meta name="ncbi_pdid" content="documentation" />
<meta name="ncbi_page" content="GEOarchive submission instructions" />
<meta name="galert_type" content="submission" />
<link rel="shortcut icon" href="/geo/img/OmixIconBare.ico" />
<link rel="stylesheet" type="text/css" href="/geo/css/reset.css" />
<link rel="stylesheet" type="text/css" href="/geo/css/nav.css" />
<link rel="stylesheet" type="text/css" href="/geo/css/info.css" />
<script type="text/javascript" src="/core/jig/1.15.10/js/jig.min.js"></script>
<script type="text/javascript" src="/geo/js/dd_menu.js"></script>
<script type="text/javascript" src="/geo/js/info.js"></script>
<script type="text/javascript">
jQuery.getScript("/core/alerts/alerts.js", function () {
galert(['#crumbs_login_bar', 'body &gt; *:nth-child(1)'])
});
</script>
<script type="text/javascript">
var ncbi_startTime = new Date();
</script>
</head>
<body id="info" class="spreadsheet">
<div id="all">
<div id="page">
<div id="header">
<div id="ncbi_logo">
<a href="/">
<img src="/geo/img/ncbi_logo.gif" alt="NCBI Logo" />
</a>
</div>
<div id="geo_logo">
<a href="/geo/"><img src="/geo/img/geo_main.gif" alt="GEO Logo" /></a>
</div>
</div>
<div id="nav_bar">
<ul id="geo_nav_bar">
<li><a href="#">GEO Publications</a>
<ul class="sublist">
<li><a href="/geo/info/GEOHandoutFinal.pdf">Handout</a></li>
<li><a href="/pmc/articles/PMC10767856/">NAR 2024 (latest)</a></li>
<li><a href="/pmc/articles/PMC99122/">NAR 2002 (original)</a></li>
<li><a href="/pmc/?term=10767856,4944384,3531084,3341798,3013736,2686538,2270403,1669752,1619900,1619899,539976,99122">All publications</a></li>
</ul>
</li>
<li><a href="/geo/info/faq.html">FAQ</a></li>
<li><a href="/geo/info/MIAME.html" title="Minimum Information About a Microarray Experiment">MIAME</a></li>
<li><a href="mailto:geo@ncbi.nlm.nih.gov">Email GEO</a></li>
</ul>
</div>
<div id="crumbs_login_bar"><a title="NCBI home page" href="/">NCBI</a> »
<a id="curr_page" title="GEO home page" href="/geo/">GEO</a> »
<a title="GEO documentation guide" href="/geo/info/">Info</a> »
<span>GEOarchive submission instructions</span><span id="login_status"><a href="/geo/submitter/" title="Click here to login. You need to do this only if you want to edit the contact information, submit data, see your unreleased data, or work with data already submitted by you. You do not need to login if you are here just to browse through public holdings">Login</a></span></div>
<div id="content">
<a name="top" id="top"></a>
<h1>GEOarchive submission instructions</h1>
<p class="highlight">
Starting in January 2025, GEO will no longer accept SAGE submissions.
Please <a href="mailto:geo@ncbi.nlm.nih.gov">contact GEO</a> if you have any questions.
</p>
<ul class="doc_list">
<li><a href="#GEOarchive">GEOarchive format</a></li>
<li><a href="#submit">How to submit</a></li>
<li><a href="#GAtemplates">GEOarchive templates and examples</a>
<ul>
<li><a href="#microarray">Microarray</a>
<ul>
<li><span>Affymetrix</span></li>
<li><span>Agilent</span></li>
<li><span>Nimblegen</span></li>
<li><span>Illumina</span></li>
<li><span>Generic</span></li>
<li><span>Platform only</span></li>
</ul>
</li>
<li><a href="#seq">High-throughput sequencing</a></li>
<li><a href="#other">Other data types</a>
<ul>
<li><span>NanoString (nCounter: RCC raw data files)</span></li>
<li><span>Xenium</span></li>
<li><span>MERFISH</span></li>
<li><span>RT-PCR</span></li>
<li><span>Traditional SAGE</span></li>
</ul>
</li>
</ul>
</li>
<li><a href="#notes">Notes for Microsoft Excel users</a> </li>
</ul>
<div class="highlight">
<b>WARNING:</b> If you are submitting human data, it is your responsibility to comply with
<a href="/geo/info/faq.html#patient">Human Subject Guidelines</a>.
</div>
<a name="GEOarchive" id="GEOarchive"></a>
<h2>GEOarchive format<a class="arrow" href="#top" title="Back to top">Back to top</a></h2>
<p class="unfloat">
GEOarchive is a flexible spreadsheet-based submission format useful for batch deposit of experiments.
GEOarchive submissions can be created in any spreadsheet software, usually Microsoft Excel.
</p>
<p>
A GEOarchive submission consists of several parts as follows:
</p>
<table class="overview">
<tbody>
<tr>
<th>Metadata spreadsheet</th>
<td>
'Metadata' refers to descriptive information and protocols for the overall experiment and
individual Samples. This information is supplied by completing all fields of the appropriate
metadata spreadsheet template which can be downloaded from the
<a href="#GAtemplates">GEOarchive templates and examples</a> section below.
</td>
</tr>
<tr>
<th>Matrix table</th>
<td>
The matrix table is a spreadsheet containing the final, normalized values that are comparable
across rows and Samples, and preferably processed as described in any accompanying manuscript.
A complete data matrix should be supplied, not a summary subset.
It is possible to include additional data columns in the table, for example,
Affymetrix Detection calls and P-values, or background or flag columns.
See the Affymetrix template for an example.
</td>
</tr>
<tr>
<th>Raw data files</th>
<td>
In addition to the normalized data provided in the Matrix table, submitters are required to
provide raw data, usually in the form of supplementary raw data files. This facilitates the
unambiguous interpretation of the data and potential verification of the conclusions as
described in the <a href="/geo/info/MIAME.html">MIAME and MINSEQE</a> standards.<br />
Affymetrix submissions must include CEL files. Non-Affymetrix GEOarchive submissions should
include the original software-generated scan quantification files, for example, GenePix GPR files.
Next-generation sequence submissions must include files containing reads and quality scores.
</td>
</tr>
<tr>
<th>Platform</th>
<td>
If your experiments are performed using a commercial array (e.g., Affymetrix GeneChip) or other array already deposited in
GEO, please use the
<a href="/geo/browse/?view=platforms&amp;tool=findplatform">FIND PLATFORM</a>
tool to find the GEO accession number (GPLxxxx) for inclusion in the '<em>platform</em>'
column in the <em>SAMPLES</em> section of the metadata spreadsheet. If your array does not
already exist in GEO, please include a <em>PLATFORM </em> section in your metadata
spreadsheet and include Platform annotation columns in your matrix table. <br />
The Platform data must include meaningful, trackable, sequence identifiers (e.g. GenBank/RefSeq accessions,
locus tags, clone IDs, oligo sequences, chromosome locations, etc - see the
<a href="/geo/info/platform.html">Platform content guidelines</a> for full list).
References to in-house databases or top BLAST hits are not sufficient.
Platform submission is not necessary for SAGE or next-generation sequence submissions.
</td>
</tr>
</tbody>
</table>
<a name="submit" id="submit"></a>
<h2>How to submit<a class="arrow" href="#top" title="Back to top">Back to top</a></h2>
<p>
Bundle all parts (Excel file containing the metadata spreadsheet and matrix spreadsheet, raw data files) together into
a .zip, .rar, or .tar archive using a program like WinZip. There are two options to transfer the resulting archive to GEO:
<ol>
<li>Use the web form to <a href="https://submit.ncbi.nlm.nih.gov/geo/submission/">Submit microarray or additional files to GEO</a>.</li>
<li>Use FTP for large submissions, see <a href="/geo/info/submissionftp.html?type=nonseq">detailed instructions here</a>.</li>
</ol>
</p>
<a name="GAtemplates" id="GAtemplates"></a>
<h2>GEOarchive templates and examples<a class="arrow" href="#top" title="Back to top">Back to top</a></h2>
<p>
The first step in creating your GEOarchive submission is to download the appropriate template (Excel spreadsheet)
from the list below. Each Excel file consists of several worksheets, including a metadata template, and examples of
metadata and matrix tables. Click the tabs at the bottom of the worksheet window to switch between worksheets.
Mouse over field names in the templates to view content guidelines.
</p>
<a name="microarray" id="microarray"></a>
<h2>Microarray<a class="arrow" href="#top" title="Back to top">Back to top</a></h2>
<p>
For the following microarray vendors, please download templates from the vendor-specific instructions pages:
</p>
<ul class="info-list">
<li><a href="/geo/info/geo_affy.html">Affymetrix submissions</a></li>
<li><a href="/geo/info/geo_agil.html">Agilent submissions</a></li>
<li><a href="/geo/info/geo_nimb.html">Nimblegen submissions</a></li>
<li><a href="/geo/info/geo_illu.html">Illumina submissions</a></li>
</ul>
<p>
For microarrays not from the vendors above, please use a 'Generic' template. For generic microarray submissions where the
Platform is already deposited in GEO, please download the most appropriate template:
</p>
<ul class="info-list">
<li><a href="/geo/info/examples/GA_single_ch.xls">Generic single channel submission template</a></li>
<li><a href="/geo/info/examples/GA_dual_ch.xls">Generic dual channel submission template</a></li>
<li><a href="/geo/info/examples/GA_merged_dye_swap.xls">Generic merged dye-swap submission template</a></li>
<li><a href="/geo/info/examples/GA_ChIP_chip.xls">Generic tiling ChIP-chip submission template</a></li>
</ul>
<p>
For generic microarray submissions where the Platform is not deposited in GEO, please download the most appropriate template:
</p>
<ul class="info-list">
<li><a href="/geo/info/examples/GA_single_ch_w_platf.xls">Generic single channel submission template, including Platform</a></li>
<li><a href="/geo/info/examples/GA_dual_ch_w_platf.xls">Generic dual channel submission template, including Platform</a></li>
<li><a href="/geo/info/examples/GA_merged_dye_swap_w_platf.xls">Generic merged dye-swap submission template, including Platform</a></li>
<li><a href="/geo/info/examples/GA_ChIP_chip_w_platf.xls">Generic tiling ChIP-chip submission template, including Platform</a></li>
</ul>
<p>
To submit only a Platform, please download the following template (this option is appropriate only if you have no hybridization or sequence data to deposit):
</p>
<ul class="info-list">
<li><a href="/geo/info/examples/GA_platform_only.xls">Platform-only template</a></li>
</ul>
<a name="seq" id="seq"></a>
<h2>High-throughput sequencing<a class="arrow" href="#top" title="Back to top">Back to top</a></h2>
<p>
For high-throughput sequence submissions, please refer to full instructions at:
</p>
<ul class="info-list">
<li><a href="/geo/info/seq.html">High-throughput sequence submissions</a></li>
</ul>
<a name="other" id="other"></a>
<h2>Other data types<a class="arrow" href="#top" title="Back to top">Back to top</a></h2>
<p>
For Xenium, MERFISH, NanoString nCounter®, or NanoString GeoMx® Digital Spatial Profiling studies with raw data in RCC format, please use one of the 'Generic single channel' templates as appropriate:
</p>
<ul class="info-list">
<li><a href="/geo/info/examples/GA_single_ch.xls">Generic single channel submission template</a></li>
<li><a href="/geo/info/examples/GA_single_ch_w_platf.xls">Generic single channel submission template, including Platform</a></li>
</ul>
<p>
For NanoString GeoMx® Digital Spatial Profiling studies with raw data in fastq format, please submit your study using GEO's high throughput sequence data submission <a href="/geo/info/seq.html">instructions</a>.
</p>
<p>
For high-throughput RT-PCR submissions, please refer to full instructions at:
</p>
<ul class="info-list">
<li><a href="/geo/info/geo_rtpcr.html">RT-PCR submissions</a></li>
</ul>
<p>
For traditional SAGE submissions, please refer to full instructions at:
</p>
<ul class="info-list">
<li><a href="/geo/info/geo_sage.html">Traditional SAGE submissions</a></li>
</ul>
<a name="notes" id="notes"></a>
<h2>Notes for Microsoft excel users<a class="arrow" href="#top" title="Back to top">Back to top</a></h2>
<p>
The following notes draw attention to common Excel-related problems.
</p>
<ul class="geo_doc_list">
<li><span>
Please be aware that Excel may automatically apply irreversible formatting to your data. According to Microsoft support:<br />
- If a number contains a slash mark (/) or hyphen (-), it may be converted to a date format.<br />
- If a number contains a colon (:), or is followed by a space and the letter A or P, it may be converted to a time format.<br />
- If a number contains the letter E (in uppercase or lowercase letters; for example, 10e5), or the number contains more characters than can be displayed based on the column width and font, the number may be converted to scientific notation, or exponential, format.<br />
- If a number contains leading zeros, the leading zeros are dropped.<br />
Certain clone identifiers, gene names, and plate coordinates are particularly susceptible to these issues. To avoid the problem, make sure to first select the whole spreadsheet and Format -&gt; Cells -&gt; Number -&gt; Text when pasting data into Excel (the default is "General").
For more information, see <a href="http://www.biomedcentral.com/1471-2105/5/80">Zeeberg et al., 2004</a>.
</span>
</li>
<li><span>
If you Format -&gt; Cells -&gt; Number -&gt; Text as described above, very long data strings (e.g., sequence data) may be converted to hash (#) characters. If this occurs, it is necessary to switch these cells back to "General" format.
</span>
</li>
</ul>
</div>
</div>
<div id="last_mod">
Last modified: August 15, 2024</div>
<div id="footer">
<span class="helpbar">|<a href="https://www.nlm.nih.gov"> NLM </a>|<a href="https://www.nih.gov"> NIH </a>|<a href="mailto:geo@ncbi.nlm.nih.gov"> Email GEO </a>|<a href="/geo/info/disclaimer.html"> Disclaimer </a>|<a href="https://www.nlm.nih.gov/accessibility.html"> Accessibility </a>|<a href="https://www.hhs.gov/vulnerability-disclosure-policy/index.html"> HHS Vulnerability Disclosure </a>|
</span>
</div>
</div>
<script type="text/javascript" src="https://www.ncbi.nlm.nih.gov/portal/portal3rc.fcgi/rlib/js/InstrumentOmnitureBaseJS/InstrumentNCBIBaseJS/InstrumentPageStarterJS.js"></script>
</body>
</html>