[C++ Toolkit ANNOUNCE] NCBI C++ Toolkit Release (April 2, 2003)

Denis Vakatov vakatov at ncbi.nlm.nih.gov
Wed Apr 16 17:08:26 EDT 2003


The newest release of the NCBI C++ Toolkit is available at:
		  ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools++/CURRENT/

It includes tarballs for UNIX, MS-Window and Darwin.

A large (and yet incomplete) list of significant changes in the
Toolkit's API and functionality can be found in:
     ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools++/CURRENT/RELEASE_NOTES



The current version of RELEASE_NOTES goes here:


#############################################################################

             NCBI C++ Toolkit Release (April 2, 2003)


#############################################################################
*** DOWNLOAD

   ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools++/Apr_2_2003/



#############################################################################
*** CONTENT

Source code archives:
  ncbi_cxx_unix.tar.gz    -- for UNIX'es (see the list of UNIX flavors below)
  ncbi_cxx_mac_cw.sit     -- for MacOS 10.X / CodeWarrior 8.0 Update 8.3
  ncbi_cxx_macosx.tar.gz  -- for MacOS 10.X / GCC 3.1
  ncbi_cxx_win.exe        -- for MS-Windows / MSVC++ 6.0 (self-extracting)
  ncbi_cxx_win.zip        -- for MS-Windows / MSVC++ 6.0


Other:
  RELEASE_NOTES           -- this file
  timestamp               -- when the sources were checked out of the CVS


#############################################################################
***  NEW DEVELOPMENTS -- LIBRARIES


------------------------------------------------------------------
+++++  CoreLib  +++++   (corelib)

1. Redesigned CRef<>:
  A) constructor made explicit,
  B) const CRef<> now returns const reference to the object.

2. Remplemented CPipe, enabled it for "windows" subsystem under MS-Windows.

3. Multiple fixes and extentions in the NStr:: string manipulation functions.

------------------------------------------------------------------
+++++  Data streaming, Networking, and Dispatching  +++++   (connect)

1. Implemented C++ API for sockets (based on the "C" SOCK API).

2. "C" SOCK API to support UDP (datagram) sockets.

3. Put in a draft implementation of C++ datagram socket API.

4. These two went to the 'util' library:
  A)  CStreamUtils::Readsome() -- a portable substitute to
      'istream::readsome' method, which also tries its best to
      behave reasonably when reading from a non-blocking source.
  B)  CStreamUtils::Pushback() -- to put an arbitrary block of data
      "back" to the standard C++ 'istream'.


------------------------------------------------------------------
+++++  Database Connectivity (DBAPI)  +++++   (dbapi)

1. Adapted FreeTDS version 8 (see LICENSE file to use it) to be built
   as a part of the Toolkit (FreeTDS driver -- on UNIX only).

2. Added projects to build DBAPI drivers as DLLs on MS-Win/MSVC++:
     CTLib, DBLib, MSDBLib, and ODBC.


------------------------------------------------------------------
+++++  CGI/FastCGI Framework  +++++   (dbapi)

1. Improved and extended logging and statistics.
2. Added "Cookie affinity" support.
3. Allow for a graceful break of FastCGI loop using a "watch file".
4. CCgiApp -- added new methods GetFCgiIteration() and IsFastCGI().


------------------------------------------------------------------
+++++  Data Serialization  +++++   (serial)

1. XML streams now support serialization of objects which are generated by
   a DTD specification (see also +Datatool+ below).

2. Added type-info structures and functions for CTime.


------------------------------------------------------------------
+++++  Object Manager  +++++   (objects/objmgr)

1. Reimplemented CSeqMap class for working with sequence maps.

  A) Performance improved significantly for large segmented sequences.
  B) Sequence map iterator CSeqMap_CI implemented for efficient browsing
     and iteration of sequence map segments.

2. Reimplemented CSeqVector class for working with sequence data.

  A) Sequence map iterator CSeqMap_CI was used for efficient access to 
     data segments.
  B) Performance improved significantly by efficient use of inlined methods.

3. Rewrote several inefficient methods of CHandleRange and
   CHandleRangeMap classes.

4. Changed in CScope:

  A) Optimized communication with CDataSource by adding several 
     data/index caches.
  B) Added priorities to data sources.

5. Rewrote CFeat_CI, CAlign_CI and CGraph_CI:

  A) Performance of feature iterator increased more then order of magnitude:
    a) CFeat_CI and CGraph_CI return temporary objects instead of mapped
       CSeq_feat or CSeq_graph,
    b) Feature gathering and location mapping is performed in one pass,
    c) Added several cached indexes,
    d) Soring of features is performed at the end of feature gathering 
       inside resulting vector<>,
    e) Highly optimized feature sorting function.

  B) Implemented lots of addditional tuning flags for feature iterator like:
    a) Several level of sequence map selection for features gathering,
    b) Several level of restriction of feature source location,
    c) Different sorting methods.

6. Added top level Seq-entry iterator to CScope (CTSE_CI).

7. Added possibility to remove or replace annotations in seq-entries indexed
   by CDataSource.

8. Added CBioseq_Handle::GetComplexityLevel() method returning Seq-entry of
   required complexity.

9. Renamed test applications and adjusted makefiles to match the source
   file names. Added a script and adjusted makefiles to run loader-related
   tests through different database interfaces (ID1 and PubSeq).


------------------------------------------------------------------
+++++  Alignment Manager  +++++   (objects/alnmgr)

1. Classes for fast and easy access to the segments or chunks of an
   alignment (CAlnMap and related) and the the base pairs of each sequence
   (CAlnVec).

2. Classes (CAlnMix and related) for constructing a virtual multiple
   alignment out of a set of input alignments.

NOTE:
   All AlnMgr functionality is based upon the Seq-align and
   Dense-seg ASN.1 specifications.


------------------------------------------------------------------
+++++  Alignment Algorithms  +++++   (algo)

A new library called `algo' has been introduced with a purpose to 
facilitate computationally intensive tasks. The library resides in a 
top-level Toolkit tree branch and is built with speed optimization on.

This first release of the library includes several alignment algorithms 
widely used by biologists:

 - CNWAligner:
      The Needleman-Wunsch algorithm producing pairwise global alignments of 
      nucleotide or protein sequences. The implementation uses affine penalty 
      model and optionally supports end space free alignments.

 - CMMAligner:
      Derived from CNWAligner, this class encapsulates the Hirschberg's 
      divide-and-conquer algorithm (also credited to Myers and Miller) under
      which the amount of space required to run the NW becomes a linear
      function of sequence's lengths. While the latter is achieved at a
      cost of lower performance, a parallel version of the algorithm which
      is also provided can run even faster than the classical NW in a
      multiple-CPU environment.

 - CNWAlignerMrna2Dna:
      As it is apparent from its name, this sort of algorithm is specifically
      designed for making spliced alignments. The algorithm calculates global
      alignment specially accounting for splice signals in its dynamic 
      programming recurrences resulting in better alignments for these
      particular types of sequences.

A sample program is provided to demonstrate the usage of the library, 
though it is also profound enough to be used as a standalone application.


------------------------------------------------------------------
+++++  Object Validator  +++++   (objects/validator)

A complete implementation of a C++ Validator is now available (excluding
validation of alignments). The test_validator application can validate
Seq-entry, Seq-submit and standalone annotation records. It can run in a
batch mode to handle NCBI release files.


------------------------------------------------------------------
+++++  Flat-File Generator +++++   (objects/flat)

There is a new version of flat-file generator, with support for some
additional output formats (EMBL, GBSeq, and tabular as well as GenBank).
NOTE: neither version produces canonical output; use the C Toolkit for that.


------------------------------------------------------------------
+++++  Miscellaneous Libs (additions and improvements)  +++++

1. `seqset'   (objects/seqset)
  -- [CSeq_entry]   Code to read sequences from FASTA files.

2. `seqloc'   (objects/seqloc)
  -- [CSeq_loc]   Caching of heavily used CSeq_loc::GetTotalRange() method.

3. `seqfeat'   (objects/??????)
  -- [CSeq_feat]  Optimized feature sorting methods.

4. `util'   (util)
  -- [logrotate]  Class for automatically-rotated log files.
  -- [strsearch]  Fast string search utilities using Boyer-Moore algorithm
                  and a finite state automat search.
  -- [range]      Optimized CRange<> and CRangeMap<> classes.

5. `xobjutil'   (objects/util)
  -- [sequence]  a) Converting coordinates between corresponding source and
                    product locations on a feature, and generally determining
                    one location's position relative to another.
                 b) Class CCdregion_translate for rapid translation
                    from a given genetic code, allowing all of the iupac
                    nucleotide ambiguity characters.
                 c) Added TestForOverlap() and GetBestOverlappingFeat().
  -- [genbank]   Optimized GenBank formatter.

6. `regexp'   (util/regexp)
  --  PCRE library (see LICENSE) has been embedded into the Toolkit.

7. `html'     (html)
  --  Added code to support Sergey Kurdin's popup menu.



#############################################################################
***  NEW DEVELOPMENTS -- APPLICATIONS and TOOLS


------------------------------------------------------------------
+++++  NCBI Genome Workbench (GBench)  +++++   (gui/gbench, gui/core, etc)

There is a fledgling GBench project which matures rapidly; it
however has its own release schedule so far (although it is a part of
the Toolkit). To get more info about the GBench:
   http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/tools/gbench/gbench.html
or contact GBench project leader, Mike DiCuccio.

"GBench:  NCBI Genome Workbench is an application for visualization of
          molecule information (sequences, alignment, molecule features, etc)."


------------------------------------------------------------------
+++++  Datatool  +++++   (serial/datatool)

1. Can now generate serialization code based on a DTD specification.
   While it does not yet support all of DTD features, the most
   widely used features are implemented, and new can be added upon request.

2. Can now generate simple clients for RPC-style ASN.1- or XML-based
   network services, and has been configured to do so for
   NCBI's ID1, Entrez2, and Medline Archive services.



#############################################################################
***  BUILD FRAMEWORK


------------------------------------------------------------------
+++++  Parallel build w/GMAKE  +++++

gmake -jN should now work properly (though the build system still
doesn't support cross-project parallelization with any version of
make).


------------------------------------------------------------------
+++++  MacOS 10.X / CodeWarrior 8.0 Update 8.3 +++++

Projects include targets to compile with BSD/Apple headers and libraries or
with MSL headers and libraries.


------------------------------------------------------------------
+++++  MS-Win/MSVC project tree  +++++

1. Added project "all_objects_generation" to auto-generate
   serialization source code "on-the-fly" during the build.
   NOTE:  so, we do not have to pack pre-generated serialization code
          to the MS-Win/MSVC distribution archive anymore.

2. Amended the code to be built as DLLs.

3. Created new workspace (and underlying projects) to build "cluster DLLs"
   (i.e. DLLs which implement APIs corresponding to several static libs).



#############################################################################
***  DOCUMENTATION

It has been mostly frozen lately due to its conversion from HTML to
XML format. The reformatted version of the docs will replace the old
one soon.

There is also an interesting development underway -- to use the
DOXYGEN source browser to complement the traditional docs with
structured DOXIGEN-generated "Library Reference" ones based
on the in-source comments.


#############################################################################
***  BACKWARD COMPATIBILITY

There have been quite a lot of API changes since the October release,
mostly in the code generation for serializable objects and in the
ObjectManager related APIs since the October release.


------------------------------------------------------------------
+++++  Exceptions  +++++

There is an effort underway to back-fit the Toolkit with
hierarchically organized exception classes, so now some parts of
the code throw these new exceptions rather than e.g. "runtime_error".
Still, the new exception class hierarchy is derived from "std::exception".
For more details, see APPENDIX "BACKWARD_COMPATIBILITY.Exception" attached.


------------------------------------------------------------------
+++++  "External" packages used  +++++

The code has been adapted to work with the newer version of
the NCBI C Toolkit and some 3rd party packages:
  FLTK  -- 1.1.3
  wxWin -- 2.4.0
  FCGI  -- 2.2.0


------------------------------------------------------------------
+++++  Miscellaneous  +++++

1. Changed names for the static-lib versions DBAPI drivers on MS Windows
   (added suffix "_static", like:  dbapi_driver_odbc_static.lib)


------------------------------------------------------------------
+++++

So, be prepared to encounter and fight code backward-compatibility issues.



#############################################################################
***  PLATFORMS (OS's, compilers used inside NCBI)


------------------------------------------------------------------
+++++  UNIX  +++++
     Linux,    INTEL,     GCC 3.0.4
     Linux,    INTEL,     ICC 7.0
     Solaris,  SPARC,     WorkShop 6 update 2 C++ 5.3 Patch 111685-13
     Solaris,  SPARC,     GCC 2.95.2
     Solaris,  SPARC,     GCC 3.0.4
     Solaris,  INTEL,     WorkShop 6 update 2 C++ 5.3 Patch 111686-13
     Solaris,  INTEL,     GCC 2.95.2
     IRIX64,   SGI-Mips,  MIPSpro 7.3.1.2m
     FreeBSD,  INTEL,     GCC 3.0.4
     Tru64,    ALPHA,     GCC 2.95.3
     Tru64,    ALPHA,     Compaq C V6.3-029 / Compaq C++ V6.5-014
    

------------------------------------------------------------------
+++++  MS Windows  +++++
     MSVC++ 6.0  Service Pack 5.....


------------------------------------------------------------------
+++++  MacOSX  +++++
     MacOS 10.1,  GCC 3.1 (patched, see "doc/config_darwin.html)
     MacOS 10.2,  GCC 3.1 (patched, see "doc/config_darwin.html)
     MacOS 10.X,  CodeWarrior 8.0 Update 8.3



#############################################################################
***  CAVEATS


------------------------------------------------------------------
+++++  MacOS 10.2 / GCC 3.1  +++++

Toolkit builds okay in all modes, but it runs okay only in Debug, non-DLL mode.

Also, GBench requires DYLD_BIND_AT_LAUNCH to be set. Otherwise it will not
run (will hang at startup waiting for semaphore).

These are GCC-related issues and Apple aware of them. The next OS release
(expected in September, with a preview available in June) is supposed to
address most of them and introduce many others, since they will be switching
to 64 bit hardware.


------------------------------------------------------------------
+++++  MacOS 10.X / CodeWarrior 8.0 Update 8.3 +++++

1. Not all of the applications are built.

2. GUI-related projects are not built.


------------------------------------------------------------------
+++++  GCC 3.2 +++++

This version has a bug (fixed in 3.2.1) in the C++ iostream implementation,
so some of the code that relies on C++ iostreams (such as serialization
and iostream wrappers for network and other types of connections, as
well as ObjectManager GenBank loader) would not work. Advice: upgrade.



#############################################################################
***  ETC

1. The GCC/X11 configuration on MacOS X behaves (builds and runs)
   just like a regular UNIX:
	   ncbi_cxx_macosx.tar.gz  -> ncbi_cxx_unix.tar.gz





=============================================================================
=============================================================================

APPENDIX


#############################################################################

BACKWARD_COMPATIBILITY:  Exceptions


1.
We are continuing with the effort to make the C++ Toolkit exceptions
more structured -- creating a library/module based hierarchy of
exceptions, such as:


CException
   CCoreException
       CFileException
       CMutexException
       CStringException
       ..........
   CConnectException
       ..........
       ..........
   CXxxException
       ..........
       ..........
   ..........
       ..........
       ..........


1.A) One reason to do this is to allow upper-level user to easily
     identify (in his code) which exactly library or module has
     thrown the exception, and what was the particular reason for throwing it.
     This is really necessary sometimes.

1.B) Another reason to do this is to make all exceptions used
     (and thrown from) inside the C++ Toolkit to have more common
     functionality (as described and implemented in the base "CException"
     class).



2.
As a consequence of this development, "C{Errno,Parse}Exception" will be
eliminated whatsoever. We realize that it may create some backward
compatibility problems for some of your code which relies on these,
but:

2.A) All problems related to the elimination of "C{Errno,Parse}Exception"
     will show up early (during the code compilation), and they can be fixed
     rather quickly -- you just start catching the new, lib/mod-specific
     exceptions (which BTW, still provide the "C{Errno,Parse}Exception" API,
     see [2.C] below).

2.B) Having such generic, library/module-independent
     "C{Errno,Parse}Exception" around does not do one any good service
     as they provide no information on their origin and specific error
     (except in the err.message, but it's neither reliable nor easy or
     obvious to extract the needed info from the messages as they are
     intended to be used for logging, and not to be parsed by some
     wild-guessing ad hoc message-parsing code).

2.C) There are templates CErrnoTemplException<> and CParseTemplException<>
     to allow the creation of exceptions which would have this specific
     ("errno" and "parse" oriented) APIs, and yet belong to the right
     place in the C++ Toolkit hierarchy.



#############################################################################
    $Date: 2003/04/16 20:57:20 $
#############################################################################


More information about the cpp-announce mailing list