511 lines
22 KiB
HTML
511 lines
22 KiB
HTML
<!DOCTYPE html>
|
||
<html lang="en">
|
||
<head>
|
||
<meta http-equiv='X-UA-Compatible' content='IE=Edge'></meta>
|
||
<meta charset="utf-8"></meta>
|
||
<meta name='viewport' content='width=device-width, initial-scale=1.0'></meta>
|
||
<link rel="shortcut icon" href="//www.niehs.nih.gov/resources/favicons/www/fav-57.png"/>
|
||
<link rel="apple-touch-icon" sizes="57x57" href="//www.niehs.nih.gov/resources/favicons/www/fav-57.png">
|
||
<link rel="apple-touch-icon" sizes="72x72" href="//www.niehs.nih.gov/resources/favicons/www/fav-72.png">
|
||
<link rel="apple-touch-icon" sizes="114x114" href="//www.niehs.nih.gov/resources/favicons/www/fav-114.png">
|
||
<link rel="apple-touch-icon" sizes="144x144" href="//www.niehs.nih.gov/resources/favicons/www/fav-144.png">
|
||
<title>ORIO | Manual</title>
|
||
|
||
|
||
<link rel='stylesheet' type='text/css'
|
||
href='//cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.3.5/css/bootstrap.min.css'>
|
||
<link rel="stylesheet" type="text/css"
|
||
href="//cdnjs.cloudflare.com/ajax/libs/font-awesome/4.4.0/css/font-awesome.min.css">
|
||
<link rel="stylesheet" type="text/css"
|
||
href="//cdnjs.cloudflare.com/ajax/libs/toastr.js/2.1.2/toastr.css">
|
||
|
||
<link rel="stylesheet" type="text/css" media="all" href="/static/css/site.css">
|
||
|
||
|
||
|
||
<script async id="_fed_an_ua_tag" charset="utf-8"
|
||
src="/static/js/ufa.min.js?agency=HHS&subagency=NIH"></script>
|
||
|
||
</head>
|
||
<body id="" class="">
|
||
<div id='wrap'>
|
||
|
||
|
||
|
||
<div class="navbar navbar-default navbar-fixed-top" role="navigation">
|
||
<div id="content" class="container">
|
||
<div class="navbar-header">
|
||
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-collapse">
|
||
<span class="sr-only">Toggle navigation</span>
|
||
<span class="icon-bar"></span>
|
||
<span class="icon-bar"></span>
|
||
<span class="icon-bar"></span>
|
||
</button>
|
||
<a class="navbar-brand" href='/'>ORIO</a>
|
||
</div>
|
||
<div class="navbar-collapse collapse">
|
||
<ul class="nav navbar-nav">
|
||
|
||
<li><a href="mailto:orio@niehs.nih.gov?subject=feedback">
|
||
<i class="fa fa-fixed fa-envelope-o"></i> Contact us</a></li>
|
||
</ul>
|
||
<ul class="nav navbar-nav navbar-right">
|
||
<li><a href="/quickstart/">Getting started</a></li>
|
||
<li><a href="/help/">Help</a></li>
|
||
|
||
<li><a href="/accounts/login/">Log in</a></li>
|
||
|
||
</ul>
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
|
||
|
||
|
||
<div id="content" class="container-fluid">
|
||
|
||
<div id="mainContent" class="container">
|
||
|
||
|
||
|
||
<h2>ORIO help</h2>
|
||
<p>
|
||
ORIO (Online Resource for Integrative Omics) is an analysis platform for data
|
||
from next generation sequencing (NGS). ORIO enables rapid analysis and
|
||
integration of NGS data sets. ORIO was designed based on three central
|
||
observations:
|
||
</p>
|
||
|
||
<ol>
|
||
|
||
<li>
|
||
Diverse biological phenomena may be represented by discrete positions in
|
||
genomic space. Think protein binding sites for transcription factor
|
||
regulation or transcription start sites for transcription initiation.
|
||
</li>
|
||
<li>
|
||
Despite a wide diversity of NGS experiment and data types, analysis of
|
||
NGS data often involves consideration and manipulation of genomic read
|
||
coverage.
|
||
</li>
|
||
<li>
|
||
Visual inspection remains a critical component of analysis.
|
||
</li>
|
||
</ol>
|
||
|
||
<p>
|
||
The bulk of analysis is performed using the
|
||
<a href="https://github.com/NIEHS/orio">ORIO analysis package</a>
|
||
<span class="glyphicon glyphicon-new-window"></span>
|
||
. An ORIO
|
||
analysis run consists of two steps. First, the intersections between a feature
|
||
list of genomic coordinates and a number of NGS data sets are found. Second, the
|
||
NGS data sets are correlated based on these intersection values. The output of
|
||
these steps may be dynamically visualized using
|
||
<a href="https://github.com/NIEHS/orio-web">ORIO-web</a>
|
||
<span class="glyphicon glyphicon-new-window"></span>
|
||
.
|
||
</p>
|
||
|
||
<img src="/static/img/orio_doc.png">
|
||
|
||
<p>
|
||
ORIO has been published in
|
||
<a href="https://doi.org/10.1093/nar/gkx270">Lavender et al. 2017</a>
|
||
<span class="glyphicon glyphicon-new-window"></span>
|
||
. To cite in your publications:
|
||
</p>
|
||
|
||
<p>
|
||
Lavender CA, Shapiro AJ, Burkholder AB, Bennett BD, Adelman K, Fargo DC. ORIO
|
||
(Online Resource for Integrative Omics): a web-based platform for rapid
|
||
integration of next generation sequencing data. Nucleic Acids Res. 2017 Jun 2;
|
||
45 (10): 5678-5690. doi: 10.1093/nar/gkx270.
|
||
</p>
|
||
|
||
<h3>Data intersection</h3>
|
||
<p>
|
||
The intersection of a feature list is iteratively found for each NGS dataset in
|
||
an analysis. This intersection describes the overlap of read coverage from the
|
||
NGS data across genomic windows anchored on feature list positions.
|
||
</p>
|
||
|
||
<img src="/static/img/matrix_py_doc.png">
|
||
|
||
<p>
|
||
ORIO focuses its analysis on a list of genomic coordinates selected called a
|
||
feature list. This feature list may be uploaded as a BED file (hyperlink), or
|
||
the user may select from genomic feature lists hosted by ORIO. Analysis is
|
||
performed considering genomic windows about each feature. Dimensions of the
|
||
windows may be adjusted using the ‘bin start,’ ‘bin number,’ and ‘bin size’
|
||
parameters when setting up an analysis.
|
||
</p>
|
||
|
||
<p>
|
||
ORIO iteratively finds the intersection of selected NGS datasets with the
|
||
genomic feature list. The reads intersecting with each feature window are found
|
||
for each dataset. Datasets may be uploaded as read coverage bigwig files
|
||
(hyperlink). If stranded data is being considered, two separate bigwig files
|
||
corresponding to forward and reverse strands may be used. Alternatively, the
|
||
user may select from hosted datasets taken from the first production run of
|
||
ENCODE.
|
||
</p>
|
||
|
||
<p>
|
||
ORIO is able to find data intersections considering strand information. If
|
||
strand information is included in the associated BED file, read coverage will be
|
||
found respecting the strand of each feature: areas downstream of a feature will
|
||
be given higher values while areas upstream will be given lower values. If the
|
||
NGS data is stranded (i.e. forward and reverse strand bigWigs are available),
|
||
then only coverage on the same strand of a stranded feature will be considered.
|
||
</p>
|
||
|
||
<p>
|
||
The product of the data intersection is a two-dimensional matrix, where each row
|
||
corresponds to a genomic feature and each column corresponds to a bin of the
|
||
genomic window. The user can download these files through the ‘Download zip’
|
||
button on an analysis page; the ‘Download zip’ command allows the user to access
|
||
any pertinent data relevant to an analysis. Matrices generated in the data
|
||
intersection step are then used in the correlative analysis step.
|
||
</p>
|
||
|
||
<h3>Correlative analysis</h3>
|
||
<p>
|
||
Using matrices generated in the data intersection step, ORIO then performs
|
||
correlative analysis based on compiled read coverage values. NGS datasets and
|
||
genomic features are grouped by hierarchical clustering and k-means clustering,
|
||
respectively. Associations discovered through clustering can implicate important
|
||
coordination of biological functions.
|
||
</p>
|
||
|
||
<img src="/static/img/matrixByMatrix_py_doc.png">
|
||
|
||
<p>
|
||
For each NGS dataset, there is a matrix of coverage values for each genomic
|
||
feature in an analysis. For each dataset pair, the Spearman correlation value is
|
||
found considering coverage values at each feature; the coverage value used is
|
||
the sum of coverage across all bins in a genomic window. Hierarchical clustering
|
||
is performed considering Spearman rho values as the pairwise distance metric.
|
||
</p>
|
||
|
||
<p>
|
||
To cluster genomic features, the total read coverage in a genomic window for
|
||
each NGS dataset is concatenated to give a one-dimensional data vector for each
|
||
feature in an analysis. These vectors are normalized by the variance in each
|
||
dataset. For each pair of features, the Euclidean distance is found considering
|
||
these normalized data vectors. k-means clustering is performed observing these
|
||
distances iteratively with k-values from 2 to 10. Clustering values for each k
|
||
are saved for future display.
|
||
</p>
|
||
|
||
<p>
|
||
Though read coverage is informative for many genomics experiments, in some NGS
|
||
experiments specialized analytical techniques must be applied to read coverage
|
||
in order to generate useful data metrics. Also, many non-NGS approaches are
|
||
relevant for genomics analysis. Acknowledging this, ORIO allows the user to
|
||
provide a single data value for each genomic feature to be used in correlative
|
||
analysis of independent NGS datasets. We call this data set the sort vector. A
|
||
sort vector may be provided at the onset of analysis in the form of a two-column
|
||
tab-delimited text file where the first column contains feature names and the
|
||
second contains data values.
|
||
</p>
|
||
|
||
<p>
|
||
If a sort vector is used, hierarchical clustering is performed focused on the
|
||
sort vector. Read coverage values for each NGS dataset are correlated with data
|
||
values in the sort vector by Spearman test. These correlation values are found
|
||
for read coverages in each genomic window bin. For each dataset, correlation
|
||
values for each bin are concatenated into a one-dimensional vector. For each
|
||
dataset pair, the Euclidean distance between these data vectors is found, and
|
||
the Euclidean distance is used as the distance metric in hierarchical
|
||
clustering. k-means clustering is performed the same in analyses with and
|
||
without a sort vector.
|
||
</p>
|
||
|
||
<p>
|
||
Correlative analysis results are stored for access and display by the web
|
||
application ORIO-web.
|
||
</p>
|
||
|
||
<h3>Data management and display of results</h3>
|
||
<p>
|
||
ORIO-web is a web application designed to maintain and organize data for
|
||
analysis by ORIO. ORIO-web also provides dynamic visualization of ORIO results.
|
||
Together ORIO and ORIO-web allow for fast, flexible, and informative integration
|
||
of whole-genome data with an intuitive web interface.
|
||
</p>
|
||
|
||
<h3>Account management</h3>
|
||
<p>
|
||
The ORIO-web landing page asks a user to generate an account associated with an
|
||
email address. All data and analyses managed by ORIO-web are associated with a
|
||
user account. Most data is privately associated with a user account; however,
|
||
ORIO-web does allow individual analyses to designated as public, allowing for
|
||
rapid sharing of results by URL address.
|
||
</p>
|
||
|
||
<h3>Data management</h3>
|
||
<p>
|
||
ORIO_web manages inputs for the ORIO analysis package. Feature lists, NGS data
|
||
sets, and sort vectors are associated with a given user account.
|
||
</p>
|
||
|
||
<p>
|
||
Data management controls are found by clicking on the 'Manage data' link button.
|
||
On the 'Data management' page, headers designate the 'Feature lists', 'Sort
|
||
vectors', and 'User dataset' sections. Data may be deleted or modified by
|
||
clicking on entries under each header, or new entries may be created by clicking
|
||
on 'Create new' buttons.
|
||
</p>
|
||
|
||
<p>
|
||
When creating new entries, each data type requires a name, an associated genome
|
||
assembly, and correctly formatted data set. Feature lists may be specified as
|
||
stranded; if so, strand must be specified for each entry in the associated BED
|
||
file in the sixth column. Sort vectors must be associated with an existing
|
||
feature list, and that feature list must be specified upon creation.
|
||
</p>
|
||
|
||
<p>
|
||
NGS data sets are uploaded to the tool as read coverage bigWig files. Given the
|
||
large size of these files, we require these files to be hosted by user and be
|
||
publicly accessible by HTTP download. When creating a data set entry, the user
|
||
must provide a valid URL for HTTP access.
|
||
</p>
|
||
|
||
<h3>Analysis management</h3>
|
||
<p>
|
||
Completed and pending analyses are presented on the ORIO-web dashboard.
|
||
</p>
|
||
|
||
<ul>
|
||
<li>
|
||
<b>Create analysis.</b> An analysis can be created by clicking on the
|
||
'Proceed to run setup' button. An analysis requires a name, genome
|
||
assembly, and feature list. Upon selecting a genome assembly, drop-down
|
||
menus for feature list, sort vector, and user-uploaded data sets are
|
||
populated. Also upon selecting a genomic assembly, ENCODE data selection
|
||
fields will be populated. ENCODE data selection was designed to navigate
|
||
through the diverse data generated by the ENCODE project. Fields such as
|
||
'Data type', 'Cell type', and 'Antibody' may be used to quickly filter
|
||
down all ENCODE data sets to a list passing filter criteria. The user
|
||
may then select individual data from this filter list.
|
||
</li>
|
||
<li>
|
||
<b>Execute analysis.</b> After all fields and options are specified, the
|
||
analysis may be saved. Upon saving, the analysis will be subject to a
|
||
validation step. Following validation, the analysis may be started by
|
||
clicking the 'Execute' button on the analysis page. Upon completion of
|
||
analysis, a message will be sent by email to the user.
|
||
</li>
|
||
<li>
|
||
<b>Modify existing analysis.</b> An analysis may be modified from the
|
||
dashboard by clicking on a completed or pending analysis and selecting
|
||
'Update' from the 'Actions' drop-down on the analysis page. From there,
|
||
the analysis parameters may be modified. An analysis may also be deleted
|
||
by selecting 'Delete' from the 'Actions' drop-down menu and confirming
|
||
the selection.
|
||
</li>
|
||
</ul>
|
||
<h3>Analysis visualization</h3>
|
||
<p>
|
||
ORIO-web provides an intuitive interface for investigating analysis results. The
|
||
visualization interface may be accessed for a completed analysis by selecting
|
||
that analysis on the dashboard and clicking 'View visualization' on the analysis
|
||
page. The results of an ORIO analysis may also be downloaded as a zip file by
|
||
selecting 'Download zip' from the 'Actions' drop-down on an analysis page.
|
||
</p>
|
||
|
||
<h4>Dataset clustering, without a sort vector.</h4>
|
||
<ul>
|
||
<li>
|
||
Data sets were hierarchically clustered based on Spearman rho values.
|
||
Clustering results are shown as a dendrogram on the left side of the top
|
||
panel. Rho values are reported by color in an n-by-n heatmap, where n is
|
||
the number of data sets. Rho values may also be found in tooltips when
|
||
hovering over individual cells. By clicking on a cell, a scatterplot
|
||
will be generated showing the points used to derive the Spearman rho
|
||
value. A drop-down menu allows for individual values to be investigated
|
||
on a bin-by-bin basis.
|
||
</li>
|
||
<li>
|
||
In the bottom panel, individual data sets may be selected in the list on
|
||
the left. Once selected, the bar plot on the right will be populated
|
||
with pairwise Spearman correlation values for each other data set. After
|
||
clicking on ‘Display individual heatmap’, a window will pop up detailing
|
||
the read coverage for that data set over the feature list.
|
||
</li>
|
||
<li>
|
||
In the pop up, a heatmap of read coverage over the user-specified
|
||
genomic window is shown on the right. In the upper-left panel, a plot of
|
||
bin-average read coverage is shown. In the mid-left panel, a plot of
|
||
bin-average read coverage over quartiles is shown. Quartiles are
|
||
generated respecting the sort order of the read coverage heatmap. The
|
||
sort order of heatmap may be changed using the lower-left panel. By
|
||
selecting a data set and clicking ‘Reorder heatmap’ the heatmap will be
|
||
re-ordered to reflect read coverage of the selected data set in
|
||
descending order, ie genomic features with greater read coverage in the
|
||
selected data set will be on top. The quartile plot will change upon
|
||
re-ordering of the read coverage heatmap. The p-value in the upper-left
|
||
corner of the quartile plot is derived from application of the
|
||
four-sample Anderson-Darling test to the quartile plots and reflects the
|
||
null hypothesis that quartiles are sampled from populations that are
|
||
identical.
|
||
</li>
|
||
</ul>
|
||
|
||
<h4>Dataset clustering, with a sort vector.</h4>
|
||
<ul>
|
||
<li>
|
||
Data sets were hierarchically clustered. For each data set, the read
|
||
coverage sum across each bin found. Then, for each given bin, the
|
||
Spearman rho value is found between the bin read coverage sums and the
|
||
sort vector. For each data set, these correlation values are
|
||
concatenated in a single data vector. The data sets are hierarchically
|
||
clustered using the pairwise Euclidean distance between each data set.
|
||
Rho values are displayed by color gradient in a n-by-m heatmap, where n
|
||
is the number of data sets and m is the number of genomic bins. By
|
||
clicking on a cell, a scatterplot will be generated showing the points
|
||
used to the derive the Spearman rho value.
|
||
</li>
|
||
<li>
|
||
In the bottom panel, individual data sets may be selected in the list on
|
||
the left. Once selected, the bar plot on the right will be populated
|
||
with Spearman correlation values for each genomic bin. After clicking on
|
||
‘Display individual heatmap’, a window will pop up detailing the read
|
||
coverage for that data set over the feature list.
|
||
</li>
|
||
<li>
|
||
In the pop up, a heatmap of read coverage over the user-specified
|
||
genomic window is shown on the right. In the upper-left panel, a plot of
|
||
bin-average read coverage is shown. In the mid-left panel, a plot of
|
||
bin-average read coverage over quartiles is shown. Quartiles are
|
||
generated respecting the sort order of the read coverage heatmap. The
|
||
sort order of heatmap may be changed using the lower-left panel. By
|
||
selecting a data set and clicking ‘Reorder heatmap’ the heatmap will be
|
||
re-ordered to reflect read coverage of the selected data set in
|
||
descending order, i.e. genomic features with greater read coverage in
|
||
the selected data set will be on top. The quartile plot will change upon
|
||
re-ordering of the read coverage heatmap. The p-value in the upper-left
|
||
corner of the quartile plot is derived from application of the
|
||
four-sample Anderson-Darling test to the quartile plots and reflects the
|
||
null hypothesis that quartiles are sampled from populations that are
|
||
identical.
|
||
</li>
|
||
</ul>
|
||
|
||
<h4>Feature clustering.</h4>
|
||
<ul>
|
||
<li>
|
||
Genomic features are clustered using k-means clustering. For each
|
||
genomic feature, sum of the read coverage for each data set is found.
|
||
These sums are then normalized such that each value is in terms of units
|
||
variance. These normalized sums are concatenated into data vectors for
|
||
each genomic feature. k-means clustering is then performed on these data
|
||
vectors. Centroids are initialized by randomly selecting individual data
|
||
vectors. k-means clustering is iteratively performed for k values 2 to
|
||
10.
|
||
</li>
|
||
<li>
|
||
In the 'Feature clustering' view, clustering results are shown on the
|
||
heatmap in the right panel. Here, each row corresponds to a genomic
|
||
feature, and each column corresponds to a data set. In each cell, the
|
||
color represents the read coverage at a genomic feature for a data set
|
||
after upper-quartile normalization. Columns are ordered based on
|
||
hierarchical clustering results with a dendrogram at the top of the
|
||
panel. Bars on the left side of the panel reflect cluster membership.
|
||
</li>
|
||
<li>
|
||
In the left panel, k values may be selected from a drop-down list.
|
||
Members of the selected cluster are displayed in a list at the bottom of
|
||
the left panel. If selected, a genomic feature will be indicated in the
|
||
heatmap by a black arrow. Also, the values of the selected genomic
|
||
feature will be displayed on the centroid chart in the bottom panel.
|
||
</li>
|
||
<li>
|
||
In the bottom panel, a two-dimensional plot displays read coverage
|
||
values for cluster centroids. Values are upper-quartile normalized. If a
|
||
genomic feature is selected in the upper panel, read coverage values for
|
||
that feature will be plot as a black line.
|
||
</li>
|
||
</ul>
|
||
|
||
|
||
</div>
|
||
|
||
</div>
|
||
|
||
</div>
|
||
|
||
|
||
<footer class="footer">
|
||
|
||
<ul id="footer-links">
|
||
<li>
|
||
<a href="http://www.niehs.nih.gov/" target="_blank">NIEHS</a></li>
|
||
<li>
|
||
<a href="http://www.niehs.nih.gov/about/od/ocpl/policies/" target="_blank">Web Policies</a></li>
|
||
<li>
|
||
<a href="http://www.niehs.nih.gov/about/od/ocpl/foia" target="_blank">Freedom of Information Act</a></li>
|
||
<li>
|
||
<a href="http://oig.hhs.gov/" target="_blank">Inspector General</a></li>
|
||
</ul>
|
||
<div id="footer-logos">
|
||
<a href="http://www.usa.gov/" target="_blank">
|
||
<img src="/static/img/usagov.png"
|
||
alt="USA.gov is the U.S. government's official web portal to all federal, state, and local government web resources and services"
|
||
title="USA.gov: Government Made Easy"></a>
|
||
<a href="http://www.hhs.gov/" target="_blank">
|
||
<img src="/static/img/hhs.png"
|
||
alt="U.S. Department of Health and Human Services"
|
||
title="U.S. Department of Health and Human Services"></a>
|
||
<a href="http://www.nih.gov/" target="_blank">
|
||
<img src="/static/img/nihMasterLogo.png"
|
||
alt="U.S. National Institutes of Health"
|
||
title="U.S. National Institutes of Health"></a>
|
||
</div>
|
||
</footer>
|
||
|
||
|
||
|
||
|
||
|
||
<script charset="utf-8"
|
||
src="//cdnjs.cloudflare.com/ajax/libs/jquery/2.1.4/jquery.min.js"></script>
|
||
<script charset="utf-8"
|
||
src="//cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.3.5/js/bootstrap.min.js"></script>
|
||
<script charset="utf-8"
|
||
src="//cdnjs.cloudflare.com/ajax/libs/toastr.js/2.1.2/toastr.min.js"></script>
|
||
|
||
<script charset="utf-8" src="/static/js/site.js"></script>
|
||
|
||
|
||
|
||
<script type="text/javascript">
|
||
toastr.options = {
|
||
closeButton: true,
|
||
newestOnTop: true,
|
||
positionClass: 'toast-top-right',
|
||
showDuration: 500,
|
||
hideDuration: 500,
|
||
timeOut: 0,
|
||
extendedTimeOut: 0,
|
||
};
|
||
|
||
|
||
|
||
window.setInterval(function(){
|
||
$.get('/dashboard/poll-messages/', function(d){
|
||
if(d.messages.length>0){
|
||
toastr.clear();
|
||
}
|
||
d.messages.forEach(function(resp){
|
||
toastr[resp.status](resp.message);
|
||
});
|
||
});
|
||
}, 60000);
|
||
</script>
|
||
|
||
</body>
|
||
</html>
|