nih-gov/www.ncbi.nlm.nih.gov/geo/info/faq.html

657 lines
43 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Frequently Asked Questions - GEO - NCBI</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="author" content="geo" />
<meta name="keywords" content="NCBI, national institutes of health, nih, database, archive, central, bioinformatics, biomedicine, geo, gene, expression, omnibus, chips, microarrays, oligonucleotide, array, sage, CGH" />
<meta name="description" content="Gene Expression Omnibus (GEO) is a database repository of high throughput gene expression data and hybridization arrays, chips, microarrays." />
<meta name="ncbiaccordion" content="collapsible: true, active: false" />
<meta name="ncbi_app" content="geo" />
<meta name="ncbi_pdid" content="documentation" />
<meta name="ncbi_page" content="Frequently Asked Questions" />
<link rel="shortcut icon" href="/geo/img/OmixIconBare.ico" />
<link rel="stylesheet" type="text/css" href="/geo/css/reset.css" />
<link rel="stylesheet" type="text/css" href="/geo/css/nav.css" />
<link rel="stylesheet" type="text/css" href="/geo/css/info.css" />
<script type="text/javascript" src="/core/jig/1.15.10/js/jig.min.js"></script>
<script type="text/javascript" src="/geo/js/dd_menu.js"></script>
<script type="text/javascript" src="/geo/js/info.js"></script>
<script type="text/javascript">
jQuery.getScript("/core/alerts/alerts.js", function () {
galert(['#crumbs_login_bar', 'body &gt; *:nth-child(1)'])
});
</script>
<script type="text/javascript">
var ncbi_startTime = new Date();
</script>
</head>
<body id="info" class="faq">
<div id="all">
<div id="page">
<div id="header">
<div id="ncbi_logo">
<a href="/">
<img src="/geo/img/ncbi_logo.gif" alt="NCBI Logo" />
</a>
</div>
<div id="geo_logo">
<a href="/geo/"><img src="/geo/img/geo_main.gif" alt="GEO Logo" /></a>
</div>
</div>
<div id="nav_bar">
<ul id="geo_nav_bar">
<li><a href="#">GEO Publications</a>
<ul class="sublist">
<li><a href="/geo/info/GEOHandoutFinal.pdf">Handout</a></li>
<li><a href="/pmc/articles/PMC10767856/">NAR 2024 (latest)</a></li>
<li><a href="/pmc/articles/PMC99122/">NAR 2002 (original)</a></li>
<li><a href="/pmc/?term=10767856,4944384,3531084,3341798,3013736,2686538,2270403,1669752,1619900,1619899,539976,99122">All publications</a></li>
</ul>
</li>
<li><a href="/geo/info/faq.html">FAQ</a></li>
<li><a href="/geo/info/MIAME.html" title="Minimum Information About a Microarray Experiment">MIAME</a></li>
<li><a href="mailto:geo@ncbi.nlm.nih.gov">Email GEO</a></li>
</ul>
</div>
<div id="crumbs_login_bar"><a title="NCBI home page" href="/">NCBI</a> »
<a id="curr_page" title="GEO home page" href="/geo/">GEO</a> »
<a title="GEO documentation guide" href="/geo/info/">Info</a> »
<span>Frequently Asked Questions</span><span id="login_status"><a href="/geo/submitter/" title="Click here to login. You need to do this only if you want to edit the contact information, submit data, see your unreleased data, or work with data already submitted by you. You do not need to login if you are here just to browse through public holdings">Login</a></span></div>
<div id="content" class="faq">
<a id="top"></a>
<h1>Frequently Asked Questions</h1>
<h2>Submission</h2>
<ul class="faq_list">
<li><a href="#what">What is GEO?</a></li>
<li><a href="#why">Why should I submit my data to GEO?</a></li>
<li><a href="#deposit">How do I submit my data to GEO?</a></li>
<li><a href="#when">When do I submit my data to GEO?</a></li>
<li><a href="#whenaccessions">When will my data receive GEO accession numbers?</a></li>
<li><a href="#kinds">What kinds of data will GEO accept?</a></li>
<li><a href="#rawdata">Does GEO store raw data?</a></li>
<li><a href="#summarysubset">Can I submit an extracted or summary subset of data?</a></li>
<li><a href="#submittersauth">How do I create a GEO account?</a></li>
<li><a href="#contactinformation">How can I make edits to my contact information?</a></li>
<li><a href="#facilityaccount">I run a facility and need to submit data for multiple investigators. What account should I use?</a></li>
<li><a href="#holduntilpublished">Can I keep my data private while my manuscript is being prepared or under review?</a></li>
<li><a href="#holdprivate">Can I keep my data private after my manuscript is published?</a></li>
<li><a href="#revieweraccess">How can I allow reviewers access to my private records?</a></li>
<li><a href="#corrections">How can I make corrections to data that I already submitted? </a></li>
<li><a href="#delete">How can I delete my records?</a></li>
<li><a href="#reviewer">I'm a reviewer, how do I access and evaluate pre-publication data?</a></li>
<li><a href="#MIAME">Does GEO support MIAME and MINSEQE standards?</a></li>
<li><a href="#patient">Human Subject Guidelines: Can I submit data derived from human subjects?</a></li>
</ul>
<h2>Query and search</h2>
<ul class="faq_list">
<li><a href="#restrictions">Who can use GEO data?</a></li>
<li><a href="#retrievals">What kinds of retrievals are possible in GEO?</a></li>
<li><a href="#analyze">How can I query and analyze GEO data?</a></li>
<li><a href="#prog">Can GEO data be accessed programmatically?</a></li>
<li><a href="#notifications">Can I get notified when new data is available? </a></li>
<li><a href="#usage">Can I cite data I find in GEO as evidence to support my own research?</a></li>
<li><a href="#seriesdataset">What is the difference between a Series and a DataSet?</a></li>
<li><a href="#profileclusters">Why can't I find gene profile charts or clusters for my study of interest?</a></li>
<li><a href="#redblue">What do the red bars and blue squares represent in GEO profile charts?</a></li>
<li><a href="#seq">What data types are provided with next-generation sequence submissions?</a></li>
</ul>
<h2>Submission</h2>
<a id="what"></a>
<h3>What is GEO?</h3>
<a href="#top" class="arrow" title="Back to top"></a>
<p>
The Gene Expression Omnibus (GEO) is a public repository that archives and freely distributes
comprehensive sets of microarray, next-generation sequencing, and other forms of high-throughput
functional genomic data submitted by the scientific community. In addition to data storage,
a collection of web-based interfaces and applications are available to help users query and
download the studies and gene expression patterns stored in GEO. For more information about various aspects of GEO,
please see our <a href="/geo/info/">documentation</a> listings and
<a href="/pmc/?term=3013736,2686538,2270403,1669752,1619900,1619899,539976,99122">publications</a>.
</p>
<a id="why"></a>
<h3>Why should I submit my data to GEO?</h3>
<a href="#top" class="arrow" title="Back to top"></a>
<p>
There are several good reasons for submitting your data to us. The most likely reason is that
the funder of your research or the journal
in which you are publishing your research requires deposit of microarray or sequence data to a
<a href="/geo/info/MIAME.html">MIAME- or MINSEQE</a>-compliant
public repository like GEO. In addition to satisfying funder and journal requirements for publication,
there are other significant benefits to depositing data with GEO. Your data receive long-term
archiving at a centralized repository, and are integrated with other NCBI resources which afford
increased usability and visibility. You may also include links back to your own project websites
within your submission, again increasing visibility of your research. Journal publication is
not a requirement for data submission to GEO.
</p>
<a id="deposit"></a>
<h3>How do I submit my data to GEO?</h3>
<a href="#top" class="arrow" title="Back to top"></a>
<p>
Submitters should first log in through their <a href="https://account.ncbi.nlm.nih.gov/">NCBI</a> account.
If you dont have an NCBI account, you can create one <a href="https://account.ncbi.nlm.nih.gov/signup/">here</a>.
Submitters are then asked to complete a <a href="/geo/submitter/">My GEO Profile form</a>
that provides the contact information to be used by GEO curators to communicate about
the submission and to be displayed on the GEO records. All submitters are asked to supply
raw data, processed data, and descriptive information about the samples, protocols and overall
study in a supported deposit format. Follow the relevant link for your data type on the
<a href="/geo/info/submission.html">Submitting data</a> page to find submission instructions.
Submitters of high throughput sequencing data can <a href="https://youtu.be/RqkRPcF38Lw">watch a tutorial video</a> on "How to submit to GEO".
We endeavor to make data
deposit procedures as straightforward as possible and will provide as much assistance as
you require to get your data submitted to GEO. If you have problems or questions about
the submission procedures, <a href="mailto:geo@ncbi.nlm.nih.gov">e-mail us</a> and one of our
curators will quickly get back to you.
</p>
<a id="when"></a>
<h3>When do I submit my data to GEO?</h3>
<a href="#top" class="arrow" title="Back to top"></a>
<p>
Many journals require accession numbers for microarray or sequence data before acceptance of a paper for publication.
Also, reviewers and editors may need access to your data during the review process.
Thus, data should be deposited in GEO before a manuscript describing the data is sent to a journal for review.
GEO processing times is approximately 5 business days after completion of submission,
but may take longer around
<a href="https://en.wikipedia.org/wiki/Federal_holidays_in_the_United_States">federal holidays</a>, so it is important to
make your submission well in advance of when you require the accession numbers for your manuscript.
Your records may <a href="#holduntilpublished">remain private</a> until your manuscript (or preprint) is publicly available.
Once your submissions have been approved by GEO staff, you can <a href="/geo/info/linking.html">cite</a> the GEO accession number(s)
in your manuscript and you can generate a reviewer <a href="#revieweraccess">access token</a>
by which editors and reviewers can access your private GEO records.
</p>
<a id="whenaccessions"></a>
<h3>When will my data receive GEO accession numbers?</h3>
<a href="#top" class="arrow" title="Back to top"></a>
<p>
Processing time normally takes approximately 5 business days after completion of submission,
but may take longer around <a href="https://en.wikipedia.org/wiki/Federal_holidays_in_the_United_States">federal holidays</a>.
After you complete the submission, your data are put into a queue to await review by a curator.
Please understand that we receive hundreds of study submissions per week, and processing times can vary depending on submission volume.
Thus, it is important to make your submission well in advance of when you require the accession numbers for your manuscript.
If format or content problems are identified with your submission, a curator will contact you by e-mail explaining how to address the issue.
Please address the issues raised by curators; failure to do so may result in processing delays or removal of the records.
Once your records pass review, the curator will send you an e-mail confirming your GEO accession numbers and their release dates.
If you do not receive an e-mail from us within 5 business days of your submission, please first check your spam or junk e-mail folders because
some systems recognize GEO e-mail correspondence as spam, then <a href="mailto:geo@ncbi.nlm.nih.gov">e-mail us</a>
to inquire about your submission. Do not quote GEO accession numbers in manuscripts until you have received an approval e-mail
notice from a GEO curator.
</p>
<a id="kinds"></a>
<h3>What kinds of data will GEO accept?</h3>
<a href="#top" class="arrow" title="Back to top"></a>
<p>
GEO was designed around the common features of most of the high-throughput and
parallel molecular abundance-measuring technologies in use today. These include data generated
from microarray and high-throughput sequence technologies, for example:
</p>
<div class="geo_info_list">
<ul id="geo_data_types">
<li>
<span class="list_text">
Gene expression profiling by microarray or next-generation sequencing
(see <a href="/gds?term=(expression+profiling+by+array[DataSet+Type]+OR+expression+profiling+by+genome+tiling+array[DataSet+Type]+OR+expression+profiling+by+high+throughput+sequencing[DataSet+Type])+AND+gse[Entry Type]">examples</a>)
</span>
</li>
<li>
<span class="list_text">
Non-coding RNA profiling by microarray or next-generation sequencing
(see <a href="/gds?term=(non+coding+rna+profiling+by+array[DataSet+Type]+OR+non+coding+rna+profiling+by+genome+tiling+array[DataSet+Type]+OR+non+coding+rna+profiling+by+high+throughput+sequencing[DataSet+Type])+AND+gse[Entry Type]">examples</a>)
</span>
</li>
<li>
<span class="list_text">
Chromatin immunoprecipitation (ChIP) profiling by microarray or next-generation sequencing
(see <a href="/gds?term=(genome+binding/occupancy+profiling+by+array[DataSet+Type]+OR+genome+binding/occupancy+profiling+by+genome+tiling+array[DataSet+Type]+OR+genome+binding/occupancy+profiling+by+high+throughput+sequencing[DataSet+Type])+AND+gse[Entry Type]">examples</a>)
</span>
</li>
<li>
<span class="list_text">
Genome methylation profiling by microarray or next-generation sequencing
(see <a href="/gds?term=(methylation+profiling+by+array[DataSet+Type]+OR+methylation+profiling+by+genome+tiling+array[DataSet+Type]+OR+methylation+profiling+by+high+throughput+sequencing[DataSet+Type])+AND+gse[Entry Type]">examples</a>)
</span>
</li>
<li>
<span class="list_text">
High-throughput RT-PCR
(see <a href="/gds?term=%22expression+profiling+by+rt+pcr%22[DataSet+Type]">examples</a>)
</span>
</li>
<li>
<span class="list_text">
Genome variation profiling by array (arrayCGH)
(see <a href="/gds?term=(genome+variation+profiling+by+array[DataSet+Type]+OR+genome+variation+profiling+by+genome+tiling+array[DataSet+Type]+OR+genome+variation+profiling+by+snp+array[DataSet+Type])+AND+gse[Entry Type]">examples</a>)
</span>
</li>
<li>
<span class="list_text">
SNP arrays
(see <a href="/gds?term=snp+genotyping+by+snp+array[DataSet+Type]+AND+gse[Entry Type]">examples</a>) (see <a href="#patient">human subject FAQ</a>)
</span>
</li>
<li>
<span class="list_text">
Serial Analysis of Gene Expression (SAGE)
(see <a href="/gds?term=expression+profiling+by+sage[DataSet+Type]+AND+gse[Entry Type]">examples</a>)
</span>
</li>
<li>
<span class="list_text">
Protein arrays
(see <a href="/gds?term=protein+profiling+by+protein+array[DataSet+Type]+AND+gse[Entry Type]">examples</a>)
</span>
</li>
</ul>
</div>
<p>
The GEO database has a flexible and open design that
is responsive to developing trends. If you have questions about whether
GEO can accept your data type, please do not hesitate to
<a href="mailto:geo@ncbi.nlm.nih.gov">contact us</a>.
</p>
<a id="rawdata"></a>
<h3>Does GEO store raw data?</h3>
<a href="#top" class="arrow" title="Back to top"></a>
<p>
Yes. GEO requires raw data, processed data and metadata. Raw data facilitates the unambiguous interpretation
of the data and potential verification of conclusions. For microarray data, raw data may be supplied either
within the Sample record data tables or as external supplementary data files, e.g., Affymetrix CEL.
For high-throughput sequencing, GEO brokers the complete set of raw data files, e.g., FASTQ, to the
SRA database on your behalf.
</p>
<a id="summarysubset"></a>
<h3>Can I submit an extracted or summary subset of data?</h3>
<a href="#top" class="arrow" title="Back to top"></a>
<p>
No. Complete, unfiltered data sets should be supplied. This includes full hybridization tables,
genome-wide sequence results, fully annotated samples, and meaningful, trackable sequence identifier
information in Platform records and processed sequence data files. The principal reason we maintain this archive
and the rationale behind many journals' requirement for data deposit into GEO is so that the community can access
and comprehensively re-examine data that form the basis of scientific reporting. Therefore, we do not accept partial
or heavily filtered data sets. We do understand the various reasons and difficulties some researchers have with sharing data.
However, the demand from users and journal editors, together with our need to maintain a useful and transparent database has
led to our policy of only accepting complete data sets.
</p>
<a id="submittersauth"></a>
<h3>How do I create a GEO account?</h3>
<a href="#top" class="arrow" title="Back to top"></a>
<p>
You will need both a <a href="https://account.ncbi.nlm.nih.gov/">NCBI</a> account and an accompanying
<a href="/geo/submitter">My GEO Profile</a> to submit data. First, log in through your NCBI account.
If you dont have a NCBI account, you can create one <a href="https://account.ncbi.nlm.nih.gov/signup/">here</a>.
Submitters are then asked to complete a My GEO Profile form that provides the contact
information to be used by GEO curators to communicate about the submission and to be displayed on the GEO records.
The NCBI account can be used to submit additional data in the future without re-entering contact
information, as well as to authenticate the submitter when updating or editing an existing GEO record.
</p>
<a id="contactinformation"></a>
<h3>How can I make edits to my contact information?</h3>
<a href="#top" class="arrow" title="Back to top"></a>
<p>
After logging in to your <a href="https://account.ncbi.nlm.nih.gov/">NCBI</a> account, follow the
<a href="/geo/submitter">My GEO Profile</a> link on the <a href="/geo/">home page</a>.
Edits to contact information will be applied immediately to all existing records submitted under that account.
If you need the contact information to remain unedited on existing records, but different contact details to appear on new records,
it is necessary to open a separate account and submit new data under that account.
</p>
<a id="facilityaccount"></a>
<h3>I run a facility and need to submit data for multiple investigators. What account should I use?</h3>
<a href="#top" class="arrow" title="Back to top"></a>
<p>
You have three choices when submitting data on behalf of others:
</p>
<ol>
<li>
Create a separate GEO Profile for each investigator for whom you will be submitting data.
Each Profile will need a separate NCBI account. When you create each GEO Profile, you can
add both the investigator's e-mail address and your own. In this case, both addresses will
receive e-mail correspondence from GEO, but only the e-mail address of the investigator will be
displayed on the GEO records.
</li>
<li>
Submit the data under your own GEO Profile. When the submission is approved, you can ask
us to transfer the submission to the investigator's GEO Profile (you must first ask them to create
their own GEO Profile and to provide you with their GEO username). In this case, you will receive e-mail
correspondence from GEO up until the time the data are moved to the investigators Profile.
</li>
<li>
Maintain one 'Facility' account and include the investigator names as 'Contributors' on their records.
For example, see <a href="/geo/query/acc.cgi?acc=GSE40272">this record</a> submitted by the
Stanford Microarray Database on behalf of one of their investigators. In this case, only the facility
will receive e-mail correspondence from GEO.
</li>
</ol>
<a id="holduntilpublished"></a>
<h3>Can I keep my data private while my manuscript is being prepared or under review?</h3>
<a href="#top" class="arrow" title="Back to top"></a>
<p>
Yes. GEO records may remain private until a manuscript (including preprint) quoting the GEO accession number is made available to
the public (journal publication is not a requirement for data submission to GEO). During the submission process,
you are prompted to specify a release date for your records. The release date is the date on which your data are
made public and will be available for anyone to access, download and re-use.
Therefore, it is very important that all your collaborators agree on the release date.
Although the maximum allowable limit is four years, this date may be brought forward or pushed back at any time; see
<a href="/geo/info/update.html#changedate">Change the release date of your private records</a> for instructions
on how to change the release date. This feature allows a submitter to deposit data and receive a GEO accession
number to quote in a manuscript before the data become public. We will send you an e-mail reminder 10 days before
the scheduled release date, inviting you to postpone the release date as necessary. It is important to
inform us as soon as your manuscript or preprint is published so that we can release your records and link them with PubMed.
Submitters also have the opportunity to create a reviewer token that allows collaborators or reviewers
confidential, read-only access to private data before manuscript publication.
</p>
<a id="holdprivate"></a>
<h3>Can I keep my data private after my manuscript is published?</h3>
<a href="#top" class="arrow" title="Back to top"></a>
<p>
No. If GEO accession numbers are quoted in a manuscript, including publicly posted unpublished preprints
through servers like bioRxiv, the records must be released so that the data are accessible to the scientific
community. Even if the preprint is intended to be temporary, if the accession is cited, the data must be released.
If GEO accession numbers are found to be quoted in any publication or preprint before the scheduled
release date, GEO staff are obligated to release those records, even if a second manuscript describing the
same data is pending.
</p>
<a id="revieweraccess"></a>
<h3>How can I allow reviewers access to my private records?</h3>
<a href="#top" class="arrow" title="Back to top"></a>
<p>
After your records have been approved, use the <i>Reviewer access</i> link near the top of your Series
(GSExxx) record to create a reviewer token which provides anonymous, read-only access to your private submissions.
The token can be sent to the journal editor who will circulate it to reviewers requiring access to your private data.
This method provides access to all private data except sequence files submitted to SRA.
SRA does not currently support access to private sequence data, but if necessary, you can <a href="mailto:sra@ncbi.nlm.nih.gov">e-mail SRA</a>
to request a reviewer metadata link.
</p>
<a id="corrections"></a>
<h3>How can I make corrections to data that I already submitted?</h3>
<a href="#top" class="arrow" title="Back to top"></a>
<p>
You may perform updates and edits at any time to any of your submissions.
Please refer to the <a href="/geo/info/update.html">Updating your GEO records</a> page for instructions.
Be aware that updates can take several business days to complete, and may take longer around
<a href="https://en.wikipedia.org/wiki/Federal_holidays_in_the_United_States">federal holidays</a>,
so it is important to make your update well in advance of when you require it to be implemented.
Also, for sequence data, note that the corresponding raw data records in SRA follow the
<a href="/sra/docs/sequence-data-processing/">NLM GenBank and SRA data Processing</a>
procedures for status changes.
</p>
<a id="delete"></a>
<h3>How can I delete my records?</h3>
<a href="#top" class="arrow" title="Back to top"></a>
<p>
Only GEO staff can remove records from the database; it is necessary to <a href="mailto:geo@ncbi.nlm.nih.gov?subject=Delete">e-mail us</a>
to request deletion of specific accession numbers. Please keep in mind that <a href="#corrections">updating records</a>
is preferable to deleting records, if appropriate. If the accession numbers in question have been published in a manuscript, including a preprint,
we cannot delete the records. Rather, a comment will be added to the record indicating the reason the submitter
requested withdrawal of the data, and the record content will be adjusted/deleted accordingly.
Also, for sequence data, note that the corresponding raw data records in SRA follow the
<a href="/sra/docs/sequence-data-processing/">NLM GenBank and SRA data Processing</a>
procedures for status changes.
</p>
<a id="reviewer"></a>
<h3>I'm a reviewer, how do I access and evaluate pre-publication data?</h3>
<a href="#top" class="arrow" title="Back to top"></a>
<p>
Reviewers should expect to receive a <i>reviewer token</i> with the manuscript.
This token allows anonymous, read-only access to the private GEO records cited in the manuscript.
Detailed information is provided in these <a href="/geo/info/reviewer.html">Guidelines for reviewers and journal editors</a>.
</p>
<a id="MIAME"></a>
<h3>Does GEO support MIAME and MINSEQE?</h3>
<a href="#top" class="arrow" title="Back to top"></a>
<p>
Yes. GEO encourages submitters to supply <a href="/geo/info/MIAME.html">MIAME-</a> and
<a href="https://www.fged.org/projects/minseqe/">MINSEQE</a>-compliant data.
GEO submission procedures are designed to closely follow the MIAME and MINSEQE checklists;
if you provide all requested information, your submission will be compliant.
Note that MIAME and MINSEQE compliance is determined by the <em>content</em> provided, not by the
submission <em>format</em> or route.
</p>
<a id="patient"></a>
<h3>Human Subject Guidelines: Can I submit data derived from human subjects?</h3>
<a href="#top" class="arrow" title="Back to top"></a>
<p>
If your data need controlled access, deposit your data with NCBI's <a href="/gap/">dbGaP</a> database.
</p>
<p>
GEO is an unrestricted-access database. Please read the following guidelines for Human Genomic Data
Submitted to Unrestricted-Access Repositories.
</p>
<p>
<em>For NIH-funded studies</em>: If you plan to submit <a href="https://osp.od.nih.gov/wp-content/uploads/Supplemental_Info_GDS_Policy.pdf">large-scale human genomic data</a>,
as defined by the <a href="https://osp.od.nih.gov/wp-content/uploads/NIH_GDS_Policy.pdf">NIH Genomic Data Sharing (GDS) Policy</a>,
to be maintained in an unrestricted-access NCBI database, NIH expects you to 1) have an
<a href="https://osp.od.nih.gov/scientific-sharing/institutional-certifications/">Institutional Certification</a>
to assure that the data submission and expectations defined in the NIH GDS Policy have been met (this Certification does not need to be submitted to GEO),
2) register the study in NCBI <a href="/bioproject/">BioProject</a>
regardless of where the data will ultimately reside (e.g., GenBank, SRA, GEO (note: if submitting to GEO, we will register a BioProject on your behalf)).
If you have any questions about whether your research is subject to the NIH GDS Policy,
please contact the relevant NIH Program Official and/or the
<a href="https://osp.od.nih.gov/wp-content/uploads/IC_GPAs.pdf">Genomic Program Administrator</a>.
If you plan to submit genomic data from human specimens that would not be considered large-scale,
it is your responsibility to ensure that the submitted information does not compromise participant
privacy, and is in accord with the original consent, in addition to all applicable laws, regulations,
and institutional policies. GEO is not able to help interpret your consent forms;
instead, you should consult with your institutional review board (IRB) on that.
</p>
<p>
<em>For non-NIH-funded studies</em>: If your data are not NIH-funded, you are not required to comply with
GDS policy but you must have the appropriate consent/permission to submit the data to a public database like GEO.
GEO is not able to help interpret your consent forms; instead, you should consult with your institutional review board (IRB) on that.
It is your responsibility to ensure that the submitted information does not compromise participant privacy
and is in accord with the original consent in addition to all applicable laws, regulations, and institutional policies.
If you do not have consent to make the data fully public in a database like GEO, you can
<a href="https://osp.od.nih.gov/wp-content/uploads/Expectations_for_Non-NIH_Funded_Submission_Requests.pdf">apply to the NIH Office of Science Policy</a>
to find an NIH Institute that will sponsor your study in NCBI's <a href="/gap/">dbGaP</a> database.
dbGaP has controlled-access mechanisms and is an appropriate resource for hosting sensitive patient data.
The sponsor would create a Data Access Request and Use Certification and define use restrictions for use
in approving data access requests.
</p>
<h2>Query and search</h2>
<a id="restrictions"></a>
<h3>Who can use GEO data?</h3>
<a href="#top" class="arrow" title="Back to top"></a>
<p>
Anybody can access and download public GEO data. There are no login requirements. For more information,
please read these <a href="/About/disclaimer.html">copyright</a> and
<a href="/geo/info/disclaimer.html">data disclaimers</a>.
</p>
<a id="retrievals"></a>
<h3>What kinds of retrievals are possible in GEO?</h3>
<a href="#top" class="arrow" title="Back to top"></a>
<p>
There are several ways to retrieve GEO data, please see the <a href="/geo/info/overview.html#query">Query and analysis</a> overview
and the <a href="/geo/info/download.html">Download GEO data</a> instructions for details.
These methods range from performing simple or sophisticated
<a href="/geo/info/qqtutorial.html">queries</a> of the <a href="/gds/">GEO DataSets</a> and
<a href="/geoprofiles/">GEO Profiles</a> databases,
entering a valid GEO accession
number in the <a href="/geo/query/acc.cgi">Accession Display</a>
bar, browsing the list of current <a href="/geo/summary/">GEO repository contents</a>, or
downloading data from the GEO
<a href="ftp://ftp.ncbi.nlm.nih.gov/geo/">FTP site</a>.
</p>
<a id="analyze"></a>
<h3>How can I query and analyze GEO data?</h3>
<a href="#top" class="arrow" title="Back to top"></a>
<p>
Once you have found a curated DataSet or Series of interest, there are several features
available that help identify interesting gene expression profiles within that study.
Some RNA-seq studies and most microarray studies can be analyzed with <a href="/geo/geo2r/">GEO2R</a>.
GEO2R is a web application that can be used to compare 2 or more groups of Samples,
and identify and plot differentially expressed genes. All records analyzable with
GEO2R can be retrieved by searching with <a href="/gds?term=&quot;geo2r&quot;[Filter]">"geo2r"[Filter]</a>.
Alternatively, there are some curated DataSets that include a <a href="/geo/info/datasets.html#findgenes">find genes</a> feature,
<a href="/geo/info/datasets.html#heatmap">cluster heatmaps</a> and a
<a href="/geo/info/datasets.html#compare">t-test sample comparison tool</a>.
Once you have identified gene expression <a href="/geo/info/profiles.html#chart">profile charts</a>
of interest, there are several types of <a href="/geo/info/profiles.html#e">neighbors links</a>
on the Profile records that help identify related genes of interest. Alternatively, if you prefer to perform your own
analysis using your favorite software package, the value matrix tables within the
DataSet full SOFT files available from the <a href="/geo/info/datasets.html#record">DataSet records</a>,
or the Series Matrix File or supplementary files linked at the foot of Series records, may prove suitable.
Finally, thousands of GEO data tracks have been uploaded for viewing on NCBIs Genome Data Viewer.
All records with tracks can be retrieved by searching with <a href="/gds/?term=track%5bfilter%5d">track[filter]</a>;
the 'See on Genome Data Viewer' button on those records links to corresponding tracks on NCBIs Genome Data Viewer
(see <a href="/genome/gdv/browser/?context=GEO&amp;acc=GSE86740">example tracks</a>).
</p>
<a id="prog"></a>
<h3>Can GEO data be accessed programmatically?</h3>
<a href="#top" class="arrow" title="Back to top"></a>
<p>
Yes. Users can take advantage of NCBI's <a href="/geo/info/geo_paccess.html">Entrez programming utilities</a> to access data stored in
<a href="/gds/">GEO DataSets</a> and
<a href="/geoprofiles/">GEO Profiles</a>.
The <a href="/geo/info/download.html">Construct a URL</a> feature is a popular mechanism to download complete metadata records in bulk.
Additionally, BioConductor users may be interested in the
<a href="http://bioconductor.org/packages/1.8/bioc/html/GEOquery.html">GEOquery package</a>
which parses GEO SOFT files for integration with BioConductor 'R' analysis resources; see
<a href="/pubmed/17496320/">publication</a>.
</p>
<a id="notifications"></a>
<h3>Can I get notified when new data is available?</h3>
<a href="#top" class="arrow" title="Back to top"></a>
<p>
Yes. This can be accomplished using an <a href="https://account.ncbi.nlm.nih.gov/">NCBI</a> account.
Once you are logged in to NCBI,
construct a search for data relevant to your interests in <a href="/gds/">GEO DataSets</a>.
For example, if you are only interested in studies performed on Platform GPL96, search with
<a href="/gds?term=GPL96[GEO Accession]">GPL96[GEO Accession]</a>;
to see any apoptosis studies, search with <a href="/gds?term=apoptosis">apoptosis</a>;
or if you want to see all new studies, search with
<a href="/gds?term=all[filter]">all[filter]</a>.
Next to the search box, you should see a <em>Save Search</em> option. You will be
presented with the option to receive e-mail alerts when new data matching your
search criteria have been added to the database. This database is updated daily.
</p>
<a id="usage"></a>
<h3>Can I cite data I find in GEO as evidence to support my own research?</h3>
<a href="#top" class="arrow" title="Back to top"></a>
<p>
Yes. Users often cite data they find in GEO to support their own studies;
please see the list of <a href="/geo/info/citations.html">third-party usage citations</a> and guidelines for
<a href="/geo/info/linking.html">Citing data you find in GEO</a>.
</p>
<a id="seriesdataset"></a>
<h3>What is the difference between a Series and a DataSet?</h3>
<a href="#top" class="arrow" title="Back to top"></a>
<p>
A GEO Series (GSExxx) is an original submitter-supplied record that summarizes a study.
These data are reassembled by GEO staff into curated GEO Datasets (GDSxxx). A DataSet
represents a collection of biologically- and statistically-comparable Samples processed
using the same Platform. Information reflecting experimental variables is provided through
DataSet subsets. Both Series and DataSets are searchable using the <a href="/gds/">GEO DataSets</a>
interface, but only DataSets form the basis of GEO's advanced data display and analysis tools
including gene expression profile charts and DataSet clusters; see the <a href="/geo/info/overview.html#org">Data organization</a>
document for more information. Not all submitted data are suitable for DataSet assembly
and we are experiencing a backlog in DataSet creation, so not all Series have a corresponding
DataSet record(s). When a curated DataSet is not available, it may be appropriate to analyze
the Series using <a href="/geo/geo2r/">GEO2R</a>, which compares groups of Samples and
identifies differentially expressed genes.
</p>
<a id="profileclusters"></a>
<h3>Why can't I find gene profile charts or DataSet clusters for my study of interest?</h3>
<a href="#top" class="arrow" title="Back to top"></a>
<p>
As explained in the <a href="#seriesdataset">What is the difference between a Series and a DataSet?</a> FAQ above,
suitable submitter-supplied GEO Series records are reassembled by
GEO staff into curated DataSets. At periodic intervals, these DataSets are then indexed and
loaded into <a href="/geoprofiles/">GEO Profiles</a> and
<a href="/gds/">GEO DataSets</a>, which
allows users to query gene names, visualize charts and clusters, and more. If your
Series of interest has not yet been assembled into a DataSet these features will not be available,
but it may be appropriate to analyze the Series using <a href="/geo/geo2r/">GEO2R</a>, which compares groups of
Samples and identifies differentially expressed genes.
</p>
<a id="redblue"></a>
<h3>What do the red bars and blue squares represent in GEO profile charts?</h3>
<a href="#top" class="arrow" title="Back to top"></a>
<p>
In <a href="/geo/info/profiles#chart">GEO Profile charts</a>,
the red bars represent values extracted from original GEO Sample
records as supplied by submitters. For single channel data, values are
assumed to be submitted as normalized signal count data,
reflecting the relative measure of abundance of each transcript. For Affymetrix data, the "detection call"
(A=absent, P=present, M=marginal) data are taken into consideration, if supplied
(absent calls faded out). For dual channel
experiments values are normalized log ratios, and SAGE values reflect "tags per million" counts.
The blue squares represent the percentile ranked value
of a spot compared to all other spots within that Sample. That is, all
values within each Sample are rank ordered and placed into rank
percentile 'bins'. This gives an indication of the relative expression
level of that gene compared to all other genes on the array.
Value profiles are plotted on a scale that fits each individual gene,
whereas rank data are always plotted on a scale of 0-100%.
</p>
<a name="seq" id="seq"></a>
<h3>What data types are provided with next-generation sequence submissions?</h3>
<a href="#top" class="arrow" title="Back to top"></a>
<p>
<em>Processed sequence data files</em>: GEO hosts submitter-supplied processed sequence data files, which are
linked at the bottom of Sample and/or Series records as supplementary files. Requirements for processed data
files are not yet fully standardized and will depend on the nature of the study, but data typically
include genome tracks or expression counts.
</p>
<p>
<em>Raw sequence data files</em>:
Submitter-supplied raw data are loaded to NCBI's <a href="/sra/">Sequence Read Archive</a> (SRA) database.
Use the <a href="/Traces/study/?go=home">SRA Run Selector</a> to list and select Runs to be downloaded or analyzed with the
<a href="https://github.com/ncbi/sra-tools/wiki/">SRA Toolkit</a>.
If you have questions about SRA format or the SRA toolkit, please <a href="mailto:sra@ncbi.nlm.nih.gov">e-mail SRA</a> directly.
</p>
<p>
<em>NCBI-generated RNA-seq count data</em>: For some RNA-seq data, NCBI precomputes RNA-seq gene expression counts
and delivers them as count matrices that may be incorporated into commonly used differential expression
analysis and visualization software. For more information, see
<a href="/geo/info/rnaseqcounts.html">NCBI-generated RNA-seq count data</a>.
</p>
</div>
</div>
<div id="last_mod">
Last modified: September 5, 2024</div>
<div id="footer">
<span class="helpbar">|<a href="https://www.nlm.nih.gov"> NLM </a>|<a href="https://www.nih.gov"> NIH </a>|<a href="mailto:geo@ncbi.nlm.nih.gov"> Email GEO </a>|<a href="/geo/info/disclaimer.html"> Disclaimer </a>|<a href="https://www.nlm.nih.gov/accessibility.html"> Accessibility </a>|<a href="https://www.hhs.gov/vulnerability-disclosure-policy/index.html"> HHS Vulnerability Disclosure </a>|
</span>
</div>
</div>
<script type="text/javascript" src="https://www.ncbi.nlm.nih.gov/portal/portal3rc.fcgi/rlib/js/InstrumentOmnitureBaseJS/InstrumentNCBIBaseJS/InstrumentPageStarterJS.js"></script>
</body>
</html>