nih-gov/www.ncbi.nlm.nih.gov/geo/info/submissionftp.html
2025-03-17 02:05:34 +00:00

491 lines
31 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>GEO File Transfer Protocol (FTP) - GEO - NCBI</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="author" content="geo" />
<meta name="keywords" content="NCBI, national institutes of health, nih, database, archive, central, bioinformatics, biomedicine, geo, gene, expression, omnibus, chips, microarrays, oligonucleotide, array, sage, CGH" />
<meta name="description" content="Gene Expression Omnibus (GEO) is a database repository of high throughput gene expression data and hybridization arrays, chips, microarrays." />
<meta name="ncbiaccordion" content="collapsible: true, active: false" />
<meta name="ncbi_app" content="geo" />
<meta name="ncbi_pdid" content="documentation" />
<meta name="ncbi_page" content="GEO File Transfer Protocol (FTP)" />
<link rel="shortcut icon" href="/geo/img/OmixIconBare.ico" />
<link rel="stylesheet" type="text/css" href="/geo/css/reset.css" />
<link rel="stylesheet" type="text/css" href="/geo/css/nav.css" />
<link rel="stylesheet" type="text/css" href="/geo/css/info.css" />
<script type="text/javascript" src="/core/jig/1.15.10/js/jig.min.js"></script>
<script type="text/javascript" src="/geo/js/dd_menu.js"></script>
<script type="text/javascript" src="/geo/js/info.js"></script>
<script type="text/javascript">
jQuery.getScript("/core/alerts/alerts.js", function () {
galert(['#crumbs_login_bar', 'body &gt; *:nth-child(1)'])
});
</script>
<script type="text/javascript">
var ncbi_startTime = new Date();
</script>
</head>
<body id="info" class="submissionftp">
<div id="all">
<div id="page">
<div id="header">
<div id="ncbi_logo">
<a href="/">
<img src="/geo/img/ncbi_logo.gif" alt="NCBI Logo" />
</a>
</div>
<div id="geo_logo">
<a href="/geo/"><img src="/geo/img/geo_main.gif" alt="GEO Logo" /></a>
</div>
</div>
<div id="nav_bar">
<ul id="geo_nav_bar">
<li><a href="#">GEO Publications</a>
<ul class="sublist">
<li><a href="/geo/info/GEOHandoutFinal.pdf">Handout</a></li>
<li><a href="/pmc/articles/PMC10767856/">NAR 2024 (latest)</a></li>
<li><a href="/pmc/articles/PMC99122/">NAR 2002 (original)</a></li>
<li><a href="/pmc/?term=10767856,4944384,3531084,3341798,3013736,2686538,2270403,1669752,1619900,1619899,539976,99122">All publications</a></li>
</ul>
</li>
<li><a href="/geo/info/faq.html">FAQ</a></li>
<li><a href="/geo/info/MIAME.html" title="Minimum Information About a Microarray Experiment">MIAME</a></li>
<li><a href="mailto:geo@ncbi.nlm.nih.gov">Email GEO</a></li>
</ul>
</div>
<div id="crumbs_login_bar"><a title="NCBI home page" href="/">NCBI</a> »
<a id="curr_page" title="GEO home page" href="/geo/">GEO</a> »
<a title="GEO documentation guide" href="/geo/info/">Info</a> »
<span>GEO File Transfer Protocol (FTP)</span><span id="login_status"><a href="/geo/submitter/" title="Click here to login. You need to do this only if you want to edit the contact information, submit data, see your unreleased data, or work with data already submitted by you. You do not need to login if you are here just to browse through public holdings">Login</a></span></div>
<div id="content">
<a name="top" id="top"></a>
<h1>GEO File Transfer Protocol (FTP)</h1>
<div class="submit_step login_to_show">
<a href="/geo/submitter/" title="Click here to login. ">Login</a> required to view complete FTP instructions with credentials and examples.
</div>
<div id="submit_instructions" style="display:none">
<div class="submit_step" id="no_folder">
<p>Step 1. Request personalized upload space for your GEO account (/<span class="userid">user</span>)</p>
<span class="button" id="request_folder_button">Create personalized upload space</span>
<span id="loading" style="display:none;"><img src="/geo/img/loading_blue.gif" width="24" height="24" /></span>
<span id="refresh_message" style="display:none"><p>Your personalized upload space is being created. Please refresh this page in a few seconds.</p></span>
</div>
<div id="with_folder" style="display:none">
<div class="submit_step">
<p><b>Step 1.</b> Your personalized upload space is:
<b>uploads/<span class="ftp-folder-name">personal_folder_name</span></b></p>
</div>
<div class="submit_step" id="datatype_selector">
<label>Select data you need to upload:</label>
<ul>
<li>
<input name="datatype" id="dt_hts" value="hts" type="radio" checked="checked" />
<label for="dt_hts">New submission of high-throughput sequencing (HTS) data</label>
</li>
<li>
<input name="datatype" id="dt_privacy" value="privacy" type="radio" />
<label for="dt_privacy">New submission of HTS study with patient-privacy concerns</label>
</li>
<li>
<input name="datatype" id="dt_requested" value="requested" type="radio" />
<label for="dt_requested">Upload additional files as requested by GEO staff for on-going submission</label>
</li>
<li>
<input name="datatype" id="dt_nonseq" value="nonseq" type="radio" />
<label for="dt_nonseq">Microarray and other (NanoString, RT-PCR, etc.)</label>
</li>
</ul>
</div>
<div class="submit_step">
<p class="hts">
<b>Step 2.</b> Transfer all your raw and processed data files to your personalized upload space according to FTP upload instructions below. <span class="red">Do not upload the metadata file by FTP.</span>
</p>
<p class="privacy hide">
<b>Step 2.</b> Transfer your metadata spreadsheet and processed data files to your personalized upload space according to FTP upload instructions below.
<br /><br />
After FTP transfer has completed,
<a href="https://submit.ncbi.nlm.nih.gov/geo/submission/">notify GEO of your submission</a>.
<br /><br />
In the comment box, please state that there are patient privacy concerns regarding public access to the raw data.
<br /><br />
<a href="https://www.ncbi.nlm.nih.gov/geo/info/faq.html#patient">More information on submitting data derived from human subjects</a>.
</p>
<p class="requested">
<b>Step 2.</b> Transfer the requested files (revised metadata spreadsheet, processed data files, or raw data files) to your personalized upload space according to FTP upload instructions below.
<br /><br />
After FTP transfer has completed,
<a href="https://submit.ncbi.nlm.nih.gov/geo/submission/">notify GEO of your submission</a>.
</p>
<p class="nonseq hide">
<b>Step 2.</b> Transfer your metadata spreadsheet, processed data files, and raw data files to your personalized upload space according to FTP upload instructions below.
<br /><br />
After FTP transfer has completed,
<a href="https://submit.ncbi.nlm.nih.gov/geo/submission/">notify GEO of your submission</a>.
</p>
<div data-jig="ncbiaccordion">
<h3>Transfer Files</h3>
<div>
<p class="red">GEO is a repository requiring both raw <b>*and*</b> processed data for submissions (see requirements <a href="/geo/info/seq.html">here</a>).
Raw file-only submissions should be made directly to <a href="/sra/docs/submit/">SRA</a>.</p>
<ol class="abc_list">
<li>
Do not begin transferring files until you have gathered and are ready to upload all required components
(raw data files, processed data files and metadata spreadsheet). The upload area has limited space capacity,
we do not have resources to store partial submissions.
Therefore, please initiate transfer only once you have gathered all required files.
</li>
<li>
Create a new folder on your computer that has a meaningful name (e.g., geo_submission_RNAseq)
and place all of your data files for a dataset into the folder. If your submission is comprised of several datasets
(e.g., ChIPseq, RNAseq, HiC) please use a separate top-level folder for each dataset,
each containing all data files for that dataset as described <a href="/geo/info/seq.html#organizing">here</a>.
Do not use extra subfolders within the dataset folder.
All files must have unique names. Do not submit the metadata file by FTP.
</li>
<li>
<b>If the raw data files in your submission exceed 2 terabytes in size, do not proceed with the upload.</b>
Please submit raw data files with total size more than 2 terabytes directly to <a href="https://submit.ncbi.nlm.nih.gov/about/sra/">SRA</a>.
After you have received the SRA accessions, please see our <a href="/geo/info/seq.html#deposit">instructions</a> and
<a href="/geo/info/examples/seq_template_with_sra_accessions.xlsx">specific template</a> for this case.
Please submit the metadata and processed data to GEO.
</li>
<li>
For PC/Mac OS users we recommend transferring files with the free third-party software, <a href="https://filezilla-project.org/download.php?show_all=1">FileZilla Client</a>.
Please see below for detailed examples and other options. <br /><br />
<span class="red">
IMPORTANT: When using FileZilla you must put your personalized upload space (/uploads/<span class="ftp-folder-name">personal_folder_name</span>)
in the remote site window to avoid the “Failed to retrieve directory listing” error that prevents file transfer. See an example in the
<a href="#filezilla" id="filezilla_link">Connecting with FileZilla</a> section below.
</span>
</li>
<li>
For LINUX/UNIX users, we recommend transferring files with 'ncftp' or 'lftp', but you can also use 'ftp', 'sftp', or 'ncftpput'. Please see below for detailed examples.
</li>
<li>
Our FTP server credentials are:
<table class="overview">
<tbody>
<tr><th>host address</th><td class="ftp-host"></td></tr>
<tr><th>username</th><td class="ftp-name"></td></tr>
<tr><th>password</th><td class="ftp-password"></td></tr>
</tbody>
</table>
<span class="red">
Do not share these log-in credentials. Do not include these log-in credentials on a public page.<br />
These credentials are changed regularly, as per our security policies.
</span>
</li>
<li>
After connecting, you <b>must navigate</b> to your personalized upload space:
uploads/<span class="ftp-folder-name">personal_folder_name</span>
</li>
<li>
After navigating to your personalized upload space, transfer the meaningfully-named submission folder from your computer to our server.
</li>
</ol>
</div>
</div>
</div>
<div class="submit_step requested hts">
<p>
<b>Step 3.</b> After FTP transfer of raw and processed data files is complete, upload Excel metadata file on the <a href="https://submit.ncbi.nlm.nih.gov/geo/submission/meta/">Submit Metadata page</a>.
</p>
<span class="button"><a href="https://submit.ncbi.nlm.nih.gov/geo/submission/meta/">Upload metadata</a></span>
</div>
</div>
</div>
<div class="jig-ncbiaccordion" id="hints_accordion" style="display:none">
<h4>Hints and tips</h4>
<div>
<ul class="geo_doc_list">
<li><span>
Your upload should include three components: (1) raw data files, (2) processed data files,
and (3) completed Metadata Template.
</span></li>
<li><span>
Files can be compressed using gzip or bzip2, and may be submitted in a tar archive but archiving
and/or compressing your files is not required. DO NOT USE ZIP!
</span></li>
<li><span>
File names should NOT include any sensitive information (these will appear publicly).
</span></li>
<li><span>
File names should be unique (DO NOT upload subdirectories containing identically-named files).
</span></li>
<li><span>
Avoid whitespace and special characters in file names. Use only alphanumerals [A-Z, a-z, 0-9],
underscores [_] and dots [.].
</span></li>
<li><span>
Do not use gz- or bzip2-compression on binary files (.BigWig, .bw, .bigBed, .bb, .h5, .bam, .tdf, etc).
</span></li>
<li><span>
For high-throughput sequencing submissions, we recommend providing the MD5 checksums for the
files that you are uploading (details below).
</span></li>
<li><span>
Please use passive &amp; binary modes when transferring files.
</span></li>
<li><span>
The FTP server is a temporary storage space. Files will be moved by curators to an internal
location for processing and assigning of accessions.
</span></li>
<li><span>
Files deposited on the FTP site are not displayed under 'My Submissions' on the web interface.
The web interface only displays accessioned submissions.
</span></li>
<li><span>
You must notify us after uploading your files. If you fail to notify us your files will be
automatically deleted from the server after two weeks.
</span></li>
</ul>
</div>
<h4>Connecting with FileZilla</h4>
<div>
<a name="filezilla" id="filezilla"></a>
<p>
You can quickly connect by entering the host (<span class="ftp-host"></span>), username (<span class="ftp-name"></span>), and password (<span class="ftp-password-hide"></span>) into the Quickconnect toolbar.
You will see an error <span class="red">** (Failed to retrieve directory listing)</span> with Quickconnect:
</p>
<img src="/geo/img/geoftp1.png" alt="Quickconnect error" />
<p>
Ignore this error. Enter the path to your personalized workspace in the 'Remote site' address bar (your path is: /uploads/<span class="ftp-folder-name"></span>):
</p>
<img src="/geo/img/geoftp2.png" alt="Remote site path" />
<p>
You can now transfer files by dragging your folder containing all submission files from the 'Local site' window and dropping into your personalized upload space ('Remote site' window).
</p>
<p>
Note that you can avoid the "directory listing" error by using the 'Site Manager' to create a session with your personalized workspace as the default
'Remote site' location, as follows (screenshots are FileZilla version 3.45.1):
</p>
<p>
Step 1. Go to 'Edit' -&gt; 'Settings' -&gt; 'Interface/Passwords' -&gt; toggle 'Save passwords' on
</p>
<img src="/geo/img/geoftp3.png" alt="Toggle passwords on" />
<p>
Step 2. Choose 'File' -&gt; 'Site Manager...'; Click on the 'New Site' button, enter a site name, and then enter the ftp credentials in the 'General' tab, selecting a 'Normal' logon:
</p>
<img src="/geo/img/geoftp4.png" alt="Site manager new site" />
<p>
Step 3. Enter the path to your personalized workspace in the 'Advanced' tab under 'Default remote directory' and then click 'OK' (your path is: /uploads/<span class="ftp-folder-name"></span>):
</p>
<img src="/geo/img/geoftp5.png" alt="Personalized workspace path" />
<p>
Step 4. Use the shortcut in the toolbar to connect to the server, or choose 'File' -&gt; 'Site Manager...', select the site under 'My sites' and then click 'Connect':
</p>
<img src="/geo/img/geoftp6.png" alt="Connect to server" />
</div>
<h4>Example Windows sessions</h4>
<div>
<p>Using free third-party software, <a href="https://filezilla-project.org/download.php?show_all=1">FileZilla Client</a></p>
<ol>
<li>
Log in to server with:
<table class="overview">
<tbody>
<tr><th>host</th><td class="ftp-host"></td></tr>
<tr><th>username</th><td class="ftp-name"></td></tr>
<tr><th>password</th><td class="ftp-password-hide"></td></tr>
</tbody>
</table>
</li>
<li>
After connecting, you will see an error:<br />
<span class="red">
Error: Failed to retrieve directory listing
</span>
<br />[See the '<b>Connecting with FileZilla</b>' section for how to avoid receiving this error]
</li>
<li>
Replace '/' in 'Remote site' address bar with '/uploads/<span class="ftp-folder-name">personal_folder_name</span>' (do not include quotes) and hit 'Enter'
</li>
<li>
Drag your folder containing all submission files from 'Local site' window and drop it into your personalized upload space ('Remote site' window)
</li>
</ol>
<p>Using Windows Explorer</p>
<ol>
<li>
Paste this url into address box: ftp://<span class="ftp-name"></span>:<span class="ftp-password-hide"></span>@<span class="ftp-host"></span>/uploads/<span class="ftp-folder-name">personal_folder_name</span>
</li>
<li>
Drag and drop your submission folder (from different Windows Explorer window) containing all submission files into your personalized upload space
</li>
</ol>
</div>
<h4>Example Mac OS sessions</h4>
<div>
<p>Using free third-party software, <a href="https://filezilla-project.org/download.php?show_all=1">FileZilla Client</a></p>
<ol>
<li>
Log in to server with:
<table class="overview">
<tbody>
<tr><th>host</th><td class="ftp-host"></td></tr>
<tr><th>username</th><td class="ftp-name"></td></tr>
<tr><th>password</th><td class="ftp-password-hide"></td></tr>
</tbody>
</table>
</li>
<li>
After connecting, you will see an error:<br />
<span class="red">
Error: Failed to retrieve directory listing
</span>
<br />[See the '<b>Connecting with FileZilla</b>' section for how to avoid receiving this error]
</li>
<li>
Replace '/' in 'Remote site' address bar with '/uploads/<span class="ftp-folder-name">personal_folder_name</span>' (do not include quotes) and hit 'Enter'
</li>
<li>
Drag your folder containing all submission files from 'Local site' window and drop it into your personalized upload space ('Remote site' window)
</li>
</ol>
<p>
Using Terminal window
</p>
<ol>
<li>Launch Terminal</li>
<li>Choose Shell &gt; New Remote Connection</li>
<li>Select 'Secure File Transfer (sftp)' in the Service list</li>
<li>Add (+) our server to server list as: s<span class="ftp-host"></span> </li>
<li>In the User field, enter 'geoftp' (no quotes), then click Connect </li>
<li>At prompt, enter password: <span class="ftp-password-hide"></span></li>
<li>
Once connected:
<code>
cd uploads/<span class="ftp-folder-name">personal_folder_name</span><br />
mkdir new_geo_submission<br />
cd new_geo_submission<br />
</code>
</li>
<li>
Use 'lcd' to go to the local directory containing your submission files:
<code>lcd local_path_to_your_files</code>
</li>
<li>
Use the put command to place one file (or mput for multiple files) into the FTP directory:
<code>
put file_name<br />
mput *
</code>
</li>
</ol>
<p>Using free third-party software, <a href="https://cyberduck.io/download/">Cyberduck</a></p>
<ol>
<li>
Launch Cyberduck, select 'Bookmark -&gt; New Bookmark', and enter the following details:
<table class="overview">
<tbody>
<tr><th>nickname</th><td>GEOFTP (in this example)</td></tr>
<tr><th>server</th><td class="ftp-host"></td></tr>
<tr><th>username</th><td class="ftp-name"></td></tr>
<tr><th>password</th><td class="ftp-password-hide"></td></tr>
</tbody>
</table>
</li>
<li>
Click 'More Options' and enter the 'Path' to your personalized upload space as:<br />
uploads/<span class="ftp-folder-name">personal_folder_name</span>
</li>
<li>
To connect to the GEO server, go to 'Bookmarks' and choose 'GEOFTP'. You can now transfer files
by dragging your folder containing all submission files into the Cyberduck browser window.
</li>
</ol>
</div>
<h4>Example Linux/Unix sessions</h4>
<div>
<p>Using 'ncftp'</p>
<code>
ncftp<br />
set passive on<br />
set so-bufsize 33554432<br />
open <span class="ftp-url-hide"></span><br />
cd uploads/<span class="ftp-folder-name">personal_folder_name</span><br />
put -R Folder_with_submission_files<br />
</code>
<p>Using 'lftp'</p>
<code>
lftp <span class="ftp-url-hide"></span><br />
cd uploads/<span class="ftp-folder-name">personal_folder_name</span><br />
mirror -R Folder_with_submission_files
</code>
<p>Using 'sftp' (expect slower transfer speeds since this method encrypts on-the-fly)</p>
<code>
sftp <span class="ftp-name"></span>@s<span class="ftp-host"></span><br />
password: <span class="ftp-password-hide"></span><br />
cd uploads/<span class="ftp-folder-name">personal_folder_name</span><br />
mkdir new_geo_submission<br />
cd new_geo_submission<br />
put file_name<br />
</code>
<p>
Using 'ncftpput' (transfers from the command-line without entering an interactive shell)<br />
Usage example:
</p>
<code>ncftpput -F -R -z -u <span class="ftp-name"></span> -p "<span class="ftp-password-hide"></span>" <span class="ftp-host"></span> ./uploads/<span class="ftp-folder-name">personal_folder_name</span> ./local_dir_path</code>
<p>local_dir_path: path to the local submission directory you are transferring to your personalized upload space</p>
<p>
-F to use passive (PASV) data connection<br />
-z is for resuming upload if a file upload gets interrupted<br />
-R to recursively upload an entire directory/tree
</p>
</div>
<h4>MD5 Checksum</h4>
<div>
<p>
This is a 32-character alphanumeric string (e.g. 7da8a01243d3ac4a4f0aa02a172bd476)
that can be computed for each file with native command line tools md5 (Mac OS X) or md5sum (Linux).
Windows users will need to install
<a href="https://support.microsoft.com/en-us/help/841290/availability-and-description-of-the-file-checksum-integrity-verifier-u">Microsoft's File Checksum Integrity Verifier (FCIV) utility</a>
or download a 3rd-party utility to compute MD5 checksums (e.g. <a href="http://implbits.com/products/hashtab/">Hashtab</a>).
This allows us to identify files that did not transfer successfully.
The checksums can be included in the Metadata Template or as one or more plain text files in your upload.
</p>
</div>
<h4>Troubleshooting FTP</h4>
<div>
<p>
If you are having trouble with your FTP connection to NCBI, try
</p>
<ol>
<li>setting passive mode rather than active mode</li>
<li>increasing FTP buffer size to 32 MB</li>
<li>another host, or another platform (e.g., Windows instead of Unix)</li>
<li>a different FTP client/software</li>
<li>consulting with your IT/systems group for assistance with your firewall configuration settings</li>
</ol>
</div>
</div>
</div>
</div>
<div id="last_mod">
Last modified: October 3, 2024</div>
<div id="footer">
<span class="helpbar">|<a href="https://www.nlm.nih.gov"> NLM </a>|<a href="https://www.nih.gov"> NIH </a>|<a href="mailto:geo@ncbi.nlm.nih.gov"> Email GEO </a>|<a href="/geo/info/disclaimer.html"> Disclaimer </a>|<a href="https://www.nlm.nih.gov/accessibility.html"> Accessibility </a>|<a href="https://www.hhs.gov/vulnerability-disclosure-policy/index.html"> HHS Vulnerability Disclosure </a>|
</span>
</div>
</div>
<script type="text/javascript" src="https://www.ncbi.nlm.nih.gov/portal/portal3rc.fcgi/rlib/js/InstrumentOmnitureBaseJS/InstrumentNCBIBaseJS/InstrumentPageStarterJS.js"></script>
</body>
</html>