nih-gov/www.ncbi.nlm.nih.gov/data_specs/NCBI_data_conversion.html

135 lines
4.7 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>DATATOOL - NCBI data conversion tool</title>
<!--
$Id: NCBI_data_conversion.html 505605 2016-06-27 17:56:12Z gouriano $
-->
</head>
<body>
<h1>DATATOOL - NCBI data conversion tool</h1>
<h2>Program Description</h2>
<p>
DATATOOL is a utility program designed to convert ASN.1 specifications
into XML DTD and vice versa, and to convert data between ASN.1 and XML
formats. DATATOOL makes it possible to convert ASN.1 specification into
XML DTD or schema, DTD into ASN.1 (with limitations), and DTD into XML
schema. Also, once the specification is known, DATATOOL can convert
data
from ASN.1 to XML, or from XML to ASN.1 format.
</p>
<p>
DATATOOL is a part of
<a href="https://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/">NCBI C++
toolkit</a>
that can be freely downloaded from:
<br>
<a href="ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools++/CURRENT/">ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools++/CURRENT/</a>
<br>
For more information please refer to:
<br>
<a
href="http://ncbi.github.io/cxx-toolkit/pages/ch_app">http://ncbi.github.io/cxx-toolkit/pages/ch_app</a>
</p>
<p>
Prebuilt DATATOOL for some platforms can be found at:
<br>
<a
href="ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools++/BIN/CURRENT/datatool">ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools++/BIN/CURRENT/datatool/</a>
</p>
<h2>Basic instructions</h2>
<p>
DATATOOL can be used to formally convert any ASN.1 or XML data or data
specification.
For the list of command line arguments please refer to
<br>
<a
href="http://ncbi.github.io/cxx-toolkit/pages/ch_app">http://ncbi.github.io/cxx-toolkit/pages/ch_app</a>
</p>
<h2>Important Note</h2>
<h2></h2>
As DATATOOL performs only <span style="font-style: italic;">formal</span>
data conversion so it cannot be used to perform any additional
processing on the converted data. If you need an additional data
processing you can either:<br>
<ul>
<li>Write your own code using <a
href="http://ncbi.github.io/cxx-toolkit/pages/ch_app">DATATOOL</a>
and <a
href="http://ncbi.github.io/cxx-toolkit/pages/ch_ser">NCBI
serialization library</a> (or any other XML or ASN.1 framework), or</li>
<li>For bio-sequence data in XML, ASN.1 and other formats use one of
specialized tools such as the NCBI ones described at <a
href="https://www.ncbi.nlm.nih.gov/Web/Newsltr/V14N1/toolkit.html">https://www.ncbi.nlm.nih.gov/Web/Newsltr/V14N1/toolkit.html</a><br>
</li>
</ul>
<h2> </h2>
<h2>Example</h2>
<p>
Converting GenBank ASN.1 data file to XML:
</p>
<ol>
<li>Obtain GenBank ASN.1 data file at:
<a href="ftp://ftp.ncbi.nlm.nih.gov/ncbi-asn1/">ftp://ftp.ncbi.nlm.nih.gov/ncbi-asn1/</a>.
Here
<a href="ftp://ftp.ncbi.nlm.nih.gov/ncbi-asn1/daily-nc/">daily-nc</a>
directory
contains individual files for each day's new or updated entries since
close-of-data for the last GenBank Release in ASN.1 format.
<blockquote>Additional documentation:
<br>
<a href="ftp://ftp.ncbi.nlm.nih.gov/ncbi-asn1/README.asn1">/ncbi-asn1/README.asn1</a>
<br>
<a
href="ftp://ftp.ncbi.nlm.nih.gov/ncbi-asn1/daily-nc/README.asn1.daily-nc">/ncbi-asn1/daily-nc/README.asn1.daily-nc</a>
</blockquote>
</li>
<li>Download the appropriate datatool binary for your platform:
<br>
<a
href="ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools++/BIN/CURRENT/datatool">ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools++/BIN/CURRENT/datatool/</a>
</li>
<li>Download NCBI data specification file:
<br>
<a href="https://ncbi.nlm.nih.gov/data_specs/asn/NCBI_all.asn">https://ncbi.nlm.nih.gov/data_specs/asn/NCBI_all.asn</a>
</li>
<li>Run the program:
<pre>./datatool -m NCBI_all.asn -d gbest225.aso -t Bioseq-set -px gbest225.xml<br></pre>
Here:
<dl>
<dt><i>gbest225.aso</i></dt>
<dd>is the name of the source GenBank data file in ASN binary
format</dd>
<dt><i>Bioseq-set</i></dt>
<dd>is the name of the data type in the source file</dd>
<dt><i>gbest225.xml</i></dt>
<dd>is the name of the output file in XML format</dd>
</dl>
</li>
</ol>
<p></p>
<hr>
<h2>PLEASE NOTE:</h2>
<p>
</p>
<ul>
<li>
The uncompressed XML file is about 10 times the size of the compressed
binary ASN.1 file, so it can be <i>extremely</i> big.
</li>
<li>The ASN.1 usually contains more than the GenBank files; it
includes other databases like PDB and RefSeq, gaps of HTG records, and
shows the quality scores of HTG records. For further information, read:
<a href="ftp://ftp.ncbi.nlm.nih.gov/ncbi-asn1/README.asn1">ftp://ftp.ncbi.nlm.nih.gov/ncbi-asn1/README.asn1</a>
</li>
</ul>
<p></p>
<hr>
<p>Please email questions at:
<a href="mailto:info@ncbi.nlm.nih.gov">info@ncbi.nlm.nih.gov</a>
</p>
<p>
Last updated: Mar 30, 2006
</p>
</body>
</html>