<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title>DATATOOL - NCBI data conversion tool</title> <!-- $Id: NCBI_data_conversion.html 505605 2016-06-27 17:56:12Z gouriano $ --> </head> <body> <h1>DATATOOL - NCBI data conversion tool</h1> <h2>Program Description</h2> <p> DATATOOL is a utility program designed to convert ASN.1 specifications into XML DTD and vice versa, and to convert data between ASN.1 and XML formats. DATATOOL makes it possible to convert ASN.1 specification into XML DTD or schema, DTD into ASN.1 (with limitations), and DTD into XML schema. Also, once the specification is known, DATATOOL can convert data from ASN.1 to XML, or from XML to ASN.1 format. </p> <p> DATATOOL is a part of <a href="https://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/">NCBI C++ toolkit</a> that can be freely downloaded from: <br> <a href="ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools++/CURRENT/">ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools++/CURRENT/</a> <br> For more information please refer to: <br> <a href="http://ncbi.github.io/cxx-toolkit/pages/ch_app">http://ncbi.github.io/cxx-toolkit/pages/ch_app</a> </p> <p> Prebuilt DATATOOL for some platforms can be found at: <br> <a href="ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools++/BIN/CURRENT/datatool">ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools++/BIN/CURRENT/datatool/</a> </p> <h2>Basic instructions</h2> <p> DATATOOL can be used to formally convert any ASN.1 or XML data or data specification. For the list of command line arguments please refer to <br> <a href="http://ncbi.github.io/cxx-toolkit/pages/ch_app">http://ncbi.github.io/cxx-toolkit/pages/ch_app</a> </p> <h2>Important Note</h2> <h2></h2> As DATATOOL performs only <span style="font-style: italic;">formal</span> data conversion so it cannot be used to perform any additional processing on the converted data. If you need an additional data processing you can either:<br> <ul> <li>Write your own code using <a href="http://ncbi.github.io/cxx-toolkit/pages/ch_app">DATATOOL</a> and <a href="http://ncbi.github.io/cxx-toolkit/pages/ch_ser">NCBI serialization library</a> (or any other XML or ASN.1 framework), or</li> <li>For bio-sequence data in XML, ASN.1 and other formats use one of specialized tools such as the NCBI ones described at <a href="https://www.ncbi.nlm.nih.gov/Web/Newsltr/V14N1/toolkit.html">https://www.ncbi.nlm.nih.gov/Web/Newsltr/V14N1/toolkit.html</a><br> </li> </ul> <h2> </h2> <h2>Example</h2> <p> Converting GenBank ASN.1 data file to XML: </p> <ol> <li>Obtain GenBank ASN.1 data file at: <a href="ftp://ftp.ncbi.nlm.nih.gov/ncbi-asn1/">ftp://ftp.ncbi.nlm.nih.gov/ncbi-asn1/</a>. Here <a href="ftp://ftp.ncbi.nlm.nih.gov/ncbi-asn1/daily-nc/">daily-nc</a> directory contains individual files for each day's new or updated entries since close-of-data for the last GenBank Release in ASN.1 format. <blockquote>Additional documentation: <br> <a href="ftp://ftp.ncbi.nlm.nih.gov/ncbi-asn1/README.asn1">/ncbi-asn1/README.asn1</a> <br> <a href="ftp://ftp.ncbi.nlm.nih.gov/ncbi-asn1/daily-nc/README.asn1.daily-nc">/ncbi-asn1/daily-nc/README.asn1.daily-nc</a> </blockquote> </li> <li>Download the appropriate datatool binary for your platform: <br> <a href="ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools++/BIN/CURRENT/datatool">ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools++/BIN/CURRENT/datatool/</a> </li> <li>Download NCBI data specification file: <br> <a href="https://ncbi.nlm.nih.gov/data_specs/asn/NCBI_all.asn">https://ncbi.nlm.nih.gov/data_specs/asn/NCBI_all.asn</a> </li> <li>Run the program: <pre>./datatool -m NCBI_all.asn -d gbest225.aso -t Bioseq-set -px gbest225.xml<br></pre> Here: <dl> <dt><i>gbest225.aso</i></dt> <dd>is the name of the source GenBank data file in ASN binary format</dd> <dt><i>Bioseq-set</i></dt> <dd>is the name of the data type in the source file</dd> <dt><i>gbest225.xml</i></dt> <dd>is the name of the output file in XML format</dd> </dl> </li> </ol> <p></p> <hr> <h2>PLEASE NOTE:</h2> <p> </p> <ul> <li> The uncompressed XML file is about 10 times the size of the compressed binary ASN.1 file, so it can be <i>extremely</i> big. </li> <li>The ASN.1 usually contains more than the GenBank files; it includes other databases like PDB and RefSeq, gaps of HTG records, and shows the quality scores of HTG records. For further information, read: <a href="ftp://ftp.ncbi.nlm.nih.gov/ncbi-asn1/README.asn1">ftp://ftp.ncbi.nlm.nih.gov/ncbi-asn1/README.asn1</a> </li> </ul> <p></p> <hr> <p>Please email questions at: <a href="mailto:info@ncbi.nlm.nih.gov">info@ncbi.nlm.nih.gov</a> </p> <p> Last updated: Mar 30, 2006 </p> </body> </html>