135 lines
4.7 KiB
HTML
135 lines
4.7 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
|
<html>
|
|
<head>
|
|
<title>DATATOOL - NCBI data conversion tool</title>
|
|
<!--
|
|
$Id: NCBI_data_conversion.html 505605 2016-06-27 17:56:12Z gouriano $
|
|
-->
|
|
</head>
|
|
<body>
|
|
<h1>DATATOOL - NCBI data conversion tool</h1>
|
|
<h2>Program Description</h2>
|
|
<p>
|
|
DATATOOL is a utility program designed to convert ASN.1 specifications
|
|
into XML DTD and vice versa, and to convert data between ASN.1 and XML
|
|
formats. DATATOOL makes it possible to convert ASN.1 specification into
|
|
XML DTD or schema, DTD into ASN.1 (with limitations), and DTD into XML
|
|
schema. Also, once the specification is known, DATATOOL can convert
|
|
data
|
|
from ASN.1 to XML, or from XML to ASN.1 format.
|
|
</p>
|
|
<p>
|
|
DATATOOL is a part of
|
|
<a href="https://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/">NCBI C++
|
|
toolkit</a>
|
|
that can be freely downloaded from:
|
|
<br>
|
|
<a href="ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools++/CURRENT/">ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools++/CURRENT/</a>
|
|
<br>
|
|
For more information please refer to:
|
|
<br>
|
|
<a
|
|
href="http://ncbi.github.io/cxx-toolkit/pages/ch_app">http://ncbi.github.io/cxx-toolkit/pages/ch_app</a>
|
|
</p>
|
|
<p>
|
|
Prebuilt DATATOOL for some platforms can be found at:
|
|
<br>
|
|
<a
|
|
href="ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools++/BIN/CURRENT/datatool">ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools++/BIN/CURRENT/datatool/</a>
|
|
</p>
|
|
<h2>Basic instructions</h2>
|
|
<p>
|
|
DATATOOL can be used to formally convert any ASN.1 or XML data or data
|
|
specification.
|
|
For the list of command line arguments please refer to
|
|
<br>
|
|
<a
|
|
href="http://ncbi.github.io/cxx-toolkit/pages/ch_app">http://ncbi.github.io/cxx-toolkit/pages/ch_app</a>
|
|
</p>
|
|
<h2>Important Note</h2>
|
|
<h2></h2>
|
|
As DATATOOL performs only <span style="font-style: italic;">formal</span>
|
|
data conversion so it cannot be used to perform any additional
|
|
processing on the converted data. If you need an additional data
|
|
processing you can either:<br>
|
|
<ul>
|
|
<li>Write your own code using <a
|
|
href="http://ncbi.github.io/cxx-toolkit/pages/ch_app">DATATOOL</a>
|
|
and <a
|
|
href="http://ncbi.github.io/cxx-toolkit/pages/ch_ser">NCBI
|
|
serialization library</a> (or any other XML or ASN.1 framework), or</li>
|
|
<li>For bio-sequence data in XML, ASN.1 and other formats use one of
|
|
specialized tools such as the NCBI ones described at <a
|
|
href="https://www.ncbi.nlm.nih.gov/Web/Newsltr/V14N1/toolkit.html">https://www.ncbi.nlm.nih.gov/Web/Newsltr/V14N1/toolkit.html</a><br>
|
|
</li>
|
|
</ul>
|
|
<h2> </h2>
|
|
<h2>Example</h2>
|
|
<p>
|
|
Converting GenBank ASN.1 data file to XML:
|
|
</p>
|
|
<ol>
|
|
<li>Obtain GenBank ASN.1 data file at:
|
|
<a href="ftp://ftp.ncbi.nlm.nih.gov/ncbi-asn1/">ftp://ftp.ncbi.nlm.nih.gov/ncbi-asn1/</a>.
|
|
Here
|
|
<a href="ftp://ftp.ncbi.nlm.nih.gov/ncbi-asn1/daily-nc/">daily-nc</a>
|
|
directory
|
|
contains individual files for each day's new or updated entries since
|
|
close-of-data for the last GenBank Release in ASN.1 format.
|
|
<blockquote>Additional documentation:
|
|
<br>
|
|
<a href="ftp://ftp.ncbi.nlm.nih.gov/ncbi-asn1/README.asn1">/ncbi-asn1/README.asn1</a>
|
|
<br>
|
|
<a
|
|
href="ftp://ftp.ncbi.nlm.nih.gov/ncbi-asn1/daily-nc/README.asn1.daily-nc">/ncbi-asn1/daily-nc/README.asn1.daily-nc</a>
|
|
</blockquote>
|
|
</li>
|
|
<li>Download the appropriate datatool binary for your platform:
|
|
<br>
|
|
<a
|
|
href="ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools++/BIN/CURRENT/datatool">ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools++/BIN/CURRENT/datatool/</a>
|
|
</li>
|
|
<li>Download NCBI data specification file:
|
|
<br>
|
|
<a href="https://ncbi.nlm.nih.gov/data_specs/asn/NCBI_all.asn">https://ncbi.nlm.nih.gov/data_specs/asn/NCBI_all.asn</a>
|
|
</li>
|
|
<li>Run the program:
|
|
<pre>./datatool -m NCBI_all.asn -d gbest225.aso -t Bioseq-set -px gbest225.xml<br></pre>
|
|
Here:
|
|
<dl>
|
|
<dt><i>gbest225.aso</i></dt>
|
|
<dd>is the name of the source GenBank data file in ASN binary
|
|
format</dd>
|
|
<dt><i>Bioseq-set</i></dt>
|
|
<dd>is the name of the data type in the source file</dd>
|
|
<dt><i>gbest225.xml</i></dt>
|
|
<dd>is the name of the output file in XML format</dd>
|
|
</dl>
|
|
</li>
|
|
</ol>
|
|
<p></p>
|
|
<hr>
|
|
<h2>PLEASE NOTE:</h2>
|
|
<p>
|
|
</p>
|
|
<ul>
|
|
<li>
|
|
The uncompressed XML file is about 10 times the size of the compressed
|
|
binary ASN.1 file, so it can be <i>extremely</i> big.
|
|
</li>
|
|
<li>The ASN.1 usually contains more than the GenBank files; it
|
|
includes other databases like PDB and RefSeq, gaps of HTG records, and
|
|
shows the quality scores of HTG records. For further information, read:
|
|
<a href="ftp://ftp.ncbi.nlm.nih.gov/ncbi-asn1/README.asn1">ftp://ftp.ncbi.nlm.nih.gov/ncbi-asn1/README.asn1</a>
|
|
</li>
|
|
</ul>
|
|
<p></p>
|
|
<hr>
|
|
<p>Please email questions at:
|
|
<a href="mailto:info@ncbi.nlm.nih.gov">info@ncbi.nlm.nih.gov</a>
|
|
</p>
|
|
<p>
|
|
Last updated: Mar 30, 2006
|
|
</p>
|
|
</body>
|
|
</html>
|