nih-gov/www.ncbi.nlm.nih.gov/geo/info/datasets.html

438 lines
24 KiB
HTML

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>About GEO DataSets - GEO - NCBI</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="author" content="geo" />
<meta name="keywords" content="NCBI, national institutes of health, nih, database, archive, central, bioinformatics, biomedicine, geo, gene, expression, omnibus, chips, microarrays, oligonucleotide, array, sage, CGH" />
<meta name="description" content="Gene Expression Omnibus (GEO) is a database repository of high throughput gene expression data and hybridization arrays, chips, microarrays." />
<meta name="ncbiaccordion" content="collapsible: true, active: false" />
<meta name="ncbi_app" content="geo" />
<meta name="ncbi_pdid" content="documentation" />
<meta name="ncbi_page" content="About GEO DataSets" />
<link rel="shortcut icon" href="/geo/img/OmixIconBare.ico" />
<link rel="stylesheet" type="text/css" href="/geo/css/reset.css" />
<link rel="stylesheet" type="text/css" href="/geo/css/nav.css" />
<link rel="stylesheet" type="text/css" href="/geo/css/info.css" />
<script type="text/javascript" src="/core/jig/1.15.10/js/jig.min.js"></script>
<script type="text/javascript" src="/geo/js/dd_menu.js"></script>
<script type="text/javascript" src="/geo/js/info.js"></script>
<script type="text/javascript">
jQuery.getScript("/core/alerts/alerts.js", function () {
galert(['#crumbs_login_bar', 'body &gt; *:nth-child(1)'])
});
</script>
<script type="text/javascript">
var ncbi_startTime = new Date();
</script>
</head>
<body id="info" class="datasets">
<div id="all">
<div id="page">
<div id="header">
<div id="ncbi_logo">
<a href="/">
<img src="/geo/img/ncbi_logo.gif" alt="NCBI Logo" />
</a>
</div>
<div id="geo_logo">
<a href="/geo/"><img src="/geo/img/geo_main.gif" alt="GEO Logo" /></a>
</div>
</div>
<div id="nav_bar">
<ul id="geo_nav_bar">
<li><a href="#">GEO Publications</a>
<ul class="sublist">
<li><a href="/geo/info/GEOHandoutFinal.pdf">Handout</a></li>
<li><a href="/pmc/articles/PMC10767856/">NAR 2024 (latest)</a></li>
<li><a href="/pmc/articles/PMC99122/">NAR 2002 (original)</a></li>
<li><a href="/pmc/?term=10767856,4944384,3531084,3341798,3013736,2686538,2270403,1669752,1619900,1619899,539976,99122">All publications</a></li>
</ul>
</li>
<li><a href="/geo/info/faq.html">FAQ</a></li>
<li><a href="/geo/info/MIAME.html" title="Minimum Information About a Microarray Experiment">MIAME</a></li>
<li><a href="mailto:geo@ncbi.nlm.nih.gov">Email GEO</a></li>
</ul>
</div>
<div id="crumbs_login_bar"><a title="NCBI home page" href="/">NCBI</a> »
<a id="curr_page" title="GEO home page" href="/geo/">GEO</a> »
<a title="GEO documentation guide" href="/geo/info/">Info</a> »
<span>About GEO DataSets</span><span id="login_status"><a href="/geo/submitter/" title="Click here to login. You need to do this only if you want to edit the contact information, submit data, see your unreleased data, or work with data already submitted by you. You do not need to login if you are here just to browse through public holdings">Login</a></span></div>
<div id="content">
<a name="top" id="top"></a>
<h1>About GEO DataSets</h1>
<ul class="page_menu">
<li><a href="#background">Background</a> </li>
<li><a href="#results">GEO DataSets Results Page</a></li>
<li><a href="#record">GEO DataSet Record</a></li>
<li>GEO DataSet Analysis Tools
<ul>
<li><a href="#findgenes">Find genes</a></li>
<li><a href="#compare">Compare 2 sets of samples</a></li>
<li><a href="#heatmap">Cluster heatmaps</a></li>
<li><a href="#distribution">Experiment design and value distribution</a></li>
</ul>
</li>
</ul>
<a name="background" id="background"></a>
<h2>Background</h2>
<p>
The <a href="/gds/">GEO DataSets</a> database stores original submitter-supplied records (Series, Samples and Platforms) as well as
curated DataSets. See the <a href="/geo/info/overview.html">Overview</a> for information about these
different records types and how they are are related to each other.
</p>
<p>
Curated DataSets form the basis of GEO's advanced data display and analysis features, including tools to identify differences in gene
expression levels and cluster heatmaps. <a href="/geoprofiles/">GEO Profiles</a> are derived from GEO DataSets. Not all original
submitter-supplied records have been assembled into curated DataSets yet.
</p>
<p>
The GEO DataSets database can be searched using many different attributes including keywords, organism, DataSet type and authors.
Examples and full details about how to search for GEO DataSets of interest are provided in the
<a href="/geo/info/qqtutorial.html">Querying GEO DataSets and GEO Profiles</a> page.
</p>
<p>
Information about how to interpret GEO DataSets results pages and how to use the
Data Analysis Tools is provided within the following annotated screenshots.
</p>
<a name="results" id="results"></a>
<h2>GEO DataSets Results Page <a title="Back to top" class="arrow" href="#top"></a></h2>
<p>
Consult the <a href="#resultstable">table</a> below for information about how to use and interpret GEO DataSet results pages.
</p>
<img src="/geo/img/datasets_results.jpg" usemap="#gds_results-map" alt="Screenshot of GEO DataSets Results pages" />
<map id="gds_results-map" name="gds_results-map">
<area href="#a" alt="Search box" title="Search box" shape="rect" coords="177,27, 621,70" />
<area href="#b" alt="Display settings" title="Display settings" shape="rect" coords="5,80, 107,104" />
<area href="#c" alt="Title line" title="Title line" shape="rect" coords="5,117, 300,142" />
<area href="#d" alt="Summary, Type, Subsets, Supplementary Files and Samples" title="Summary, Type, Subsets, Supplementary Files and Samples" shape="rect" coords="5,149, 537,297" />
<area href="#e" alt="GEO Profiles and Links" title="GEO Profiles and Links" shape="rect" coords="529,98, 595,139" />
<area href="#f" alt="Filter your results" title="Filter your results" shape="rect" coords="600,78, 755,163" />
<area href="#g" alt="Thumbnail cluster image" title="Thumbnail cluster image" shape="rect" coords="536,186, 599,281" />
<area href="#h" alt="Find related data" title="Find related data" shape="rect" coords="600,179, 760,240" />
</map>
<a id="resultstable"></a>
<table class="overview">
<tbody>
<tr>
<th class="letter">A</th>
<td>Search box</td>
<td><a name="a" id="a"></a>Identify GEO DataSets of interest by entering keywords or a search statement into this box.
Various terms can be used in the search, including keywords, organism, DataSet type and authors.
Examples and full details about how to construct search statements are provided in the
<a href="/geo/info/qqtutorial.html">Querying GEO DataSets and GEO Profiles</a> page.
Search results can be saved in your <a href="/bookshelf/br.fcgi?book=helpmyncbi">My NCBI</a>
account using the <a href="/bookshelf/br.fcgi?book=helpmyncbi&amp;part=MyNCBI#MyNCBI.Collections">Collections</a> feature.
The <a href="/gds/advanced/">Advanced Search</a>
page provides user-friendly tools to help construct complex queries.</td>
</tr>
<tr>
<th class="letter">B</th>
<td>Display Settings and Send to</td>
<td><a name="b" id="b"></a>Use <em>Display Settings</em> to change the display format or the number of items to display.
Use <em>Send to</em> to export the results as a plain text File, or save the results to the Clipboard or your My NCBI
<a href="/bookshelf/br.fcgi?book=helpmyncbi&amp;part=MyNCBI#MyNCBI.Collections">Collections</a>.
</td>
</tr>
<tr>
<th class="letter">C</th>
<td>Title line</td>
<td><a name="c" id="c"></a>Lists the DataSet (GDS), Series (GSE) or Platform (GPL) accession number, followed by title and organism. </td>
</tr>
<tr>
<th class="letter">D</th>
<td>Summary, Type, Subsets, Supplementary Files and Samples</td>
<td><a name="d" id="d"></a>
<p>
<em>Summary</em>: A summary description of the DataSet, Series or Platform record.
</p>
<p>
<em>Type</em>: The DataSet or Series type. Types indicate the general application (e.g., expression profiling)
as well as the technology (e.g., high-throughput sequencing).
DataSets records also display the Sample Value Type.
</p>
<p>
<em>Subsets</em>: A summary of the number and type of experimental variable subsets represented in the DataSet.
</p>
<p>
<em>Supplementary Files</em>: Indicates the types of supplementary files that were supplied with the original submission.
Supplementary files usually refer to native raw data files, e.g., Affymetrix CEL files.
</p>
<p>
<em>Samples</em>: States the number of Samples in the DataSet or Series, and lists the Sample accessions numbers (GSM) and titles.
</p>
</td>
</tr>
<tr>
<th class="letter">E</th>
<td>GEO Profiles and Links</td>
<td><a name="e" id="e"></a>
Reciprocal links to relevant records in other NCBI databases including PubMed, Epigenomics and SRA.
Links to corresponding GEO Profiles are provided on DataSets.
Links can also be retrieved in batch mode, see <a href="#h">Find related data</a> section below.
</td>
</tr>
<tr>
<th class="letter">F</th>
<td>Filter your results</td>
<td><a name="f" id="f"></a>Lists the number of DataSet, Series and Platform records retrieved by your query.
Click to restrict your retrievals to a specific record type.
</td>
</tr>
<tr>
<th class="letter">G</th>
<td>Thumbnail cluster image</td>
<td><a name="g" id="g"></a>Clusters are provided on DataSets.
Click the image to be directed to the DataSet record with contains several data analysis tools,
including clusters heatmaps, see <a href="#heatmap">Cluster heatmaps</a> section below.
</td>
</tr>
<tr>
<th class="letter">H</th>
<td>Find related data</td>
<td><a name="h" id="h"></a>This feature is similar to that described in the <a href="#e">GEO Profiles and Links</a> section above, but in batch mode.</td>
</tr>
</tbody>
</table>
<a id="record"></a>
<h2>GEO DataSet Record <a title="Back to top" class="arrow" href="#top"></a></h2>
<p>
Consult the <a href="#recordtable">table</a> below for information about how to use and interpret GEO DataSet Records.
</p>
<img src="/geo/img/datasets_record.jpg" usemap="#gds_record-map" alt="Screenshot of GEO DataSet Record" />
<map id="gds_record-map" name="gds_record-map">
<area href="#i" alt="Descriptive information about the DataSet" title="Descriptive information about the DataSet" shape="rect" coords="3,60, 589,238" />
<area href="#j" alt="Thumbnail cluster image" title="Thumbnail cluster image" shape="rect" coords="587,94, 746,156" />
<area href="#k" alt="Download" title="Download" shape="rect" coords="588,157, 747,235" />
<area href="#l" alt="Data analysis tools" title="Data analysis tools" shape="rect" coords="3,241, 740,341" />
</map>
<a id="recordtable"></a>
<table class="overview">
<tbody>
<tr>
<th class="letter">I</th>
<td>Descriptive information about the DataSet</td>
<td><a name="i" id="i"></a>This section includes the DataSet title, summary, organism, Platform, citation(s),
the original (reference) Series upon which the DataSet is based, the type of values the Samples have, the number of Samples the DataSet contains and the date on which the original Series was made public.</td>
</tr>
<tr>
<th class="letter">J</th>
<td>Thumbnail cluster image</td>
<td><a name="j" id="j"></a>Click the image to be directed to the full-size default cluster heatmap (Uncentered Correlation UPGMA).
See the <a href="#heatmap">Cluster heatmaps</a> section below for details on cluster types and cluster program features.</td>
</tr>
<tr>
<th class="letter">K</th>
<td>Download</td>
<td><a name="k" id="k"></a>
<p>Several download options are provided, including: </p>
<p>
<em>DataSet full SOFT file </em> (recommended): Contains DataSet information, experiment variable subsets, expression value measurements
and comprehensive up-to-date gene annotation for the DataSet Platform (plain text, tab-delimited format).
</p>
<p>
<em>DataSet SOFT file</em>: Contains DataSet information, experiment variable subsets, expression value measurements and gene symbols,
(plain text, tab-delimited format).
</p>
<p>
<em>Series family SOFT file</em>: Contains the complete, original, submitter-supplied records that form the basis of this DataSet
(plain text, tab-delimited format).
</p>
<p>
<em>Series family MINiML file</em>: Contains the complete, original, submitter-supplied records that form the basis of this DataSet (XML format).
</p>
<p>
<em>Annotation SOFT file</em>: Contains comprehensive up-to-date gene annotation for the DataSet Platform (plain text, tab-delimited format).
</p>
</td>
</tr>
<tr>
<th class="letter">L</th>
<td>Data analysis tools</td>
<td><a name="l" id="l"></a>Information about each of the Data Analysis Tools is provided in the sections below.</td>
</tr>
</tbody>
</table>
<a name="findgenes" id="findgenes"></a>
<h2>Find genes <a title="Back to top" class="arrow" href="#top"></a></h2>
<img src="/geo/img/datasets_find_genes.jpg" usemap="#gds_find_genes-map" alt="Screenshot of Find Genes" />
<map id="gds_find_genes-map" name="gds_find_genes-map">
<area href="#m" alt="Find genes" title="Find genes" shape="rect" coords="0,0, 567,112" />
</map>
<table class="overview">
<tbody>
<tr>
<th class="letter">M</th>
<td>Find genes</td>
<td><a name="m" id="m"></a>
<p>
<em>Find gene name or symbol</em>: Type in the name or symbol of the gene you want to locate in this DataSet,
and you will be directed to relevant Profiles.
</p>
<p>
<em>Find genes that are up/down for this condition(s)</em>: Use this feature to help identify genes that are flagged as having subset effects,
in other words, genes that are differentially expressed according to experimental subsets.
Subsets are groups of Samples within a DataSet that are categorized according to major
experimental variables, for example, gender, disease state, etc.
For DataSets that have more than one subset type you can restrict retrievals to genes that are expressed differentially
in one specific subset type by selecting/deselecting the check boxes as required.
The subset effect flag is calculated using the original submitter-supplied expression measurements
as contained in the VALUE column of the Sample records.
Given the diversity of data and VALUE types and ranges received at GEO,
this flag is calculated in a somewhat ad hoc manner and is only an attempt
to give potentially differentially-expressed genes higher visibility.
To perform more robust analyses, either try a t-test using the <a href="#compare">Compare 2 sets of samples</a> query tool
or upload the <a href="#k">DataSet full SOFT file</a> into your favorite microarray analysis software.
</p>
</td>
</tr>
</tbody>
</table>
<a name="compare" id="compare"></a>
<h2>Compare 2 sets of samples <a title="Back to top" class="arrow" href="#top"></a></h2>
<img src="/geo/img/datasets_compare_samples.jpg" usemap="#gds_compare_samples-map" alt="Screenshot of Compare 2 sets of samples" />
<map id="gds_compare_samples-map" name="gds_compare_samples-map">
<area href="#n" alt="Compare 2 sets of samples" title="Compare 2 sets of samples" shape="rect" coords="7,27, 545,131" />
<area href="#o" alt="Assign Samples to Group A and Group B" title="Assign Samples to Group A and Group B" shape="rect" coords="5,154, 345,390" />
</map>
<table class="overview">
<tbody>
<tr>
<th class="letter">N</th>
<td>Compare 2 sets of samples</td>
<td><a name="n" id="n"></a>
<p>The purpose of this tool is to help identify genes that display marked differences
in expression level between two sets of Samples (Group A and Group B).
Typically, users compare Samples that belong to different experiment variable subsets.
</p>
<p><em>Step 1</em>: Select the test to perform, and a significance level.
<a href="http://en.wikipedia.org/wiki/Student%27s_t-test"> Student's t-test</a>, or value or rank means fold differences are available.
</p>
<p><em>Step 2</em>: Select which Samples to put in Group A and which Samples to put in Group B.
See <a href="#o">Section O</a> for details on how to assign Samples to Group A and Group B.
</p>
<p><em>Step 3</em>: Query Group A vs. Group B. The t-test score or means fold difference for each group is calculated.
Genes that pass the user-selected criteria are presented in GEO Profiles.
</p>
<p><em>Notes and caveats</em>: Calculations are based on the original submitter-supplied expression measurements
as contained in the VALUE column of the Sample records.
Note that there is great diversity in the data values and ranges provided by GEO submitters.
The <a href="http://en.wikipedia.org/wiki/Student%27s_t-test">student's t-test</a> is a well established statistical method to determine if the means of two
sets of data are really different. There are basic assumptions made by the t-test, thus results may be wrong or misleading based on the validity of these assumptions. The t-test requires at least 2 samples in each group.
Value or rank means fold differences is perhaps the most rudimentary method to filter data.
Retrievals may have no statistical significance, or compared subsets may be too small to provide any statistic value (e.g., singletons).
If values are null or absent they are ignored in the calculations.
If one group of values is empty, its value is assumed to be zero for mean group fold. If both groups of values are empty, the profile is skipped.
The result set may be empty if no profiles pass the criteria. There is no way to know a priori what filter to use to provide meaningful results or that meaningful results will be obtained.
</p>
</td>
</tr>
<tr>
<th class="letter">O</th>
<td>Assign Samples to Group A and Group B</td>
<td><a name="o" id="o"></a>
Select which Samples you want to assign to Group A (left column) and which to Group B (right column).
The colored blocks in the middle provide information on the experimental variable subsets within the DataSet.
Click on the Sample accession numbers (GSMxxx) to select Samples individually, or click on colored blocks and then on blinking arrows to select entire groups of Samples.
You may limit Samples in groups by unchecking the boxes for any groups or Samples you do not wish to include.
In the example above, the user has opted to compare all 'non-diabetic' Samples (Group A) with all 'type 2 diabetes' Samples (Group B).
</td>
</tr>
</tbody>
</table>
<a name="heatmap" id="heatmap"></a>
<h2>Cluster heatmaps <a title="Back to top" class="arrow" href="#top"></a></h2>
<img src="/geo/img/datasets_cluster.jpg" usemap="#gds_cluster-map" alt="Screenshot of Cluster Heatmaps" />
<map id="gds_cluster-map" name="gds_cluster-map">
<area href="#p" alt="Cluster heatmaps" title="Cluster heatmaps" shape="rect" coords="5,30, 530,132" />
<area href="#q" alt="Cluster options" title="Cluster options" shape="rect" coords="7,206, 360,232" />
<area href="#r" alt="Select regions of interest on the heatmap image" title="Select regions of interest on the heatmap image" shape="rect" coords="5,322, 409,418" />
</map>
<table class="overview">
<tbody>
<tr>
<th class="letter">P</th>
<td>Cluster heatmaps</td>
<td><a name="p" id="p"></a> The full range of cluster types is available from this section,
including unsupervised hierarchical clusters, K-means/K-median clusters,
and heatmaps organized by location of genes on the chromosome.
Background information and details about each cluster type are provided on the
<a href="/geo/info/cluster.html">GEO Dataset Cluster Analysis</a> page.
</td>
</tr>
<tr>
<th class="letter">Q</th>
<td>Cluster options</td>
<td><a name="q" id="q"></a>Options for downloading, plotting or exporting selected data to GEO Profiles, and changing the colors of the heatmap are available. For hierarchical clusters, it is also possible to change the cluster type from this area.</td>
</tr>
<tr>
<th class="letter">R</th>
<td>Select regions of interest on the heatmap image</td>
<td><a name="r" id="r"></a>Click the heatmap image to select a region of the cluster for further analysis.
A faded selection box will appear; drag and/or resize the height of the box to cover the region of interest.
To select more than one region, click the '+' icon on the left side of the selection box then repeat the process to select more regions.
To zoom in to a selected region, either double click the selection box or click "Stack up" to view multiple selected regions.
Gene symbols are listed on the right side of the zoomed-in cluster.
It is possible to search for specific genes within this list using the <i>Ctrl F</i> function of your browser.
Use the 'Download', 'Plot values' or 'View in Entrez' buttons to retrieve data for the selected region.
</td>
</tr>
</tbody>
</table>
<a name="distribution" id="distribution"></a>
<h2>Experiment design and value distribution <a title="Back to top" class="arrow" href="#top"></a></h2>
<img src="/geo/img/datasets_experiment.jpg" usemap="#gds_experiment-map" alt="Screenshot of Distribution" />
<map id="gds_experiment-map" name="gds_experiment-map">
<area href="#s" alt="Experiment design and value distribution" title="Experiment design and value distribution" shape="rect" coords="5,5, 600,224" />
</map>
<table class="overview">
<tbody>
<tr>
<th class="letter">S</th>
<td>Experiment design and value distribution</td>
<td><a name="s" id="s"></a>Depicts a <a href="http://en.wikipedia.org/wiki/Boxplot">box plot</a> displaying the distribution of expression values of each Sample within a DataSet. The plot is useful for determining whether the DataSet is normalized, i.e., the value distributions are median-centered across Samples.
The colored bars at the bottom of the chart represent experimental variable subsets within the DataSet. Each subset has a type, e.g., 'age', and a description, e.g., '8 week'. For example, in the chart above, the first Sample GSM9920 is derived from an 8 week old, non-diabetic mouse.</td>
</tr>
</tbody>
</table>
</div>
</div>
<div id="last_mod">
Last modified: July 16, 2024</div>
<div id="footer">
<span class="helpbar">|<a href="https://www.nlm.nih.gov"> NLM </a>|<a href="https://www.nih.gov"> NIH </a>|<a href="mailto:geo@ncbi.nlm.nih.gov"> Email GEO </a>|<a href="/geo/info/disclaimer.html"> Disclaimer </a>|<a href="https://www.nlm.nih.gov/accessibility.html"> Accessibility </a>|<a href="https://www.hhs.gov/vulnerability-disclosure-policy/index.html"> HHS Vulnerability Disclosure </a>|
</span>
</div>
</div>
<script type="text/javascript" src="https://www.ncbi.nlm.nih.gov/portal/portal3rc.fcgi/rlib/js/InstrumentOmnitureBaseJS/InstrumentNCBIBaseJS/InstrumentPageStarterJS.js"></script>
</body>
</html>