807 lines
32 KiB
HTML
807 lines
32 KiB
HTML
<!doctype html public "-//IETF//DTD HTML//EN">
|
|
<HTML>
|
|
|
|
<HEAD>
|
|
|
|
<TITLE>NCBI News, September 1995</TITLE>
|
|
|
|
<body bgcolor="#f0f0f0">
|
|
|
|
<META NAME="GENERATOR" CONTENT="Internet Assistant for Word 1.0Z">
|
|
<META NAME="AUTHOR" CONTENT="Douglas L. Hunt">
|
|
</HEAD>
|
|
|
|
<BODY>
|
|
|
|
<P>
|
|
<IMG SRC="/Gifs/newslogo.gif">
|
|
|
|
<P>
|
|
September 1995<HR>
|
|
|
|
<P>
|
|
<B><A NAME="TOC"></A>In This Issue</B>
|
|
<P>
|
|
<A HREF="#Megabase">GenBank Enters Megabase Era</A> <BR>
|
|
<A HREF="#GraphicalView">Entrez Takes Graphical View</A> <BR>
|
|
<A HREF="#Taxonomy">GenBank Taxonomy</A> <BR>
|
|
<A HREF="#BankIt">BankIt Submissions Mount</A> <BR>
|
|
<A HREF="#NCBIFTP">NCBI Data by FTP</A> <BR>
|
|
<A HREF="#RecentPubs">Recent Publications</A> <BR>
|
|
<A HREF="#FAQ">Frequently Asked Questions</A> <BR>
|
|
<A HREF="#GenBank">GenBank Services</A><BR>
|
|
<A HREF="#Masthead">Masthead</A><HR>
|
|
|
|
<H3><A NAME="Megabase"></A>GenBank Enters the Megabase Sequence
|
|
Era </H3>
|
|
|
|
<P>
|
|
Large-scale sequencing efforts have already produced a number
|
|
of completely sequenced genomes or chromosomes from a variety
|
|
of organisms. In making these sequences available, GenBank is
|
|
charting a course to best serve the varied needs of the scientific
|
|
community. On the one hand it is scientifically exciting to view
|
|
the large-scale organization of very long stretches of contiguous
|
|
DNA. But on the other hand, most biology still focuses on the
|
|
detailed study of individual genes. Having a database search turn
|
|
up megabases of DNA surrounding a gene of interest can often be
|
|
more of a hindrance than a help.
|
|
<P>
|
|
<B>New Genome Division</B>
|
|
<P>
|
|
At the annual International Nucleotide Sequence Database Collaborators
|
|
meeting in April, GenBank, EMBL, and DDBJ agreed on a practical
|
|
approach to handling megabase sequences. Rather than creating
|
|
single large entries, genome-size submissions will be divided
|
|
into several entries, each no more than 350 KB long. These will
|
|
be assigned to the appropriate GenBank division. "Virtual
|
|
records" that define the method for assembling the long sequence
|
|
will be stored in a new genome division. The individual segments
|
|
can be assembled by retrieval software so that users can view
|
|
on demand the complete genome, chromosome, or other unit of interest.
|
|
<P>
|
|
The 350KB limit for any individual database entry is a maximum,
|
|
not a recommended size, and was selected so as to not break existing
|
|
molecular biology software tools. Submitters are encouraged to
|
|
submit entries below this limit, corresponding to the "natural"
|
|
units in which the sequencing is done, often cosmid size pieces,
|
|
or entries containing only a few genes.
|
|
<P>
|
|
The sequence database collaboration has already defined the virtual
|
|
record format, which contains feature table information on how
|
|
to assemble the individual entries into a single contiguous sequence
|
|
representing the complete megabase sequence. The sequence databases
|
|
will individually be experimenting with ways to present these
|
|
sequences to users this year, then share experiences at next year's
|
|
meeting and determine the optimal path for future development.
|
|
<P>
|
|
<B>Complete Sequence</B>
|
|
<P>
|
|
GenBank will be offering the megabase sequences in a number of
|
|
forms on our FTP site and through our search services.
|
|
<P>
|
|
The virtual records will be available in GenBank flatfile format
|
|
as well as ASN.1. These records are actually quite small because
|
|
they contain no sequence, only information about how to put other
|
|
records together to make the megabase sequence. The NCBI will
|
|
also instantiate the virtual records by filling in the sequence
|
|
and feature tables according to the assembly instructions, creating
|
|
a single huge entry. These large composite entries will be available
|
|
in FASTA format and GenBank flatfile format on NCBI's FTP site
|
|
(ncbi.nlm.nih.gov) in the genbank/genomes directory. The complete
|
|
<I>Haemophilus influenzae</I> sequence is currently available
|
|
in this directory.
|
|
<P>
|
|
<B>Graphical Views</B>
|
|
<P>
|
|
Network Entrez users will have the added functionality of a graphical
|
|
sequence viewer. This new feature will present graphical views
|
|
of sequences, and any associated genetic and physical maps, on
|
|
demand. One will be able to view a schematic of the whole megabase
|
|
sequence, and then look in detail at a subregion. Any subregion
|
|
can then be selected for a more detailed graphical view of biological
|
|
features annotated on that region.
|
|
<P>
|
|
These and other new views and services will become available to
|
|
the public over the next year. We encourage your comments and
|
|
suggestions during this period.
|
|
<P>
|
|
<A HREF="#TOC">Return to Table of Contents</A><HR>
|
|
|
|
<H3><A NAME="GraphicalView"></A>New Entrez Release Takes Graphical
|
|
View</H3>
|
|
|
|
<P>
|
|
The October release of Network Entrez and Entrez on CD-ROM will
|
|
include a graphical viewer for displaying the locations of features
|
|
annotated on sequences. A graphical overview is helpful for understanding
|
|
large or complicated sequence entries and is much easier to interpret
|
|
than a list of numerical positions shown in text report formats.
|
|
The capability is essential for viewing sequences of entire chromosomes
|
|
and genomes, and associated maps, available through the new genome
|
|
division in Network Entrez. Although access to the genome division
|
|
is not possible on the CD-ROM, all other graphical viewing functions
|
|
are there.
|
|
<P>
|
|
A tabbed folder approach to selecting alternate report formats
|
|
has also been added, making it very easy to move quickly between
|
|
text and graphical display formats.
|
|
<P>
|
|
<B>3D Structure</B>
|
|
<P>
|
|
For Network Entrez users, the new release also includes an explicit
|
|
3D structure database derived from crystallographic and NMR data
|
|
in PDB, the Brookhaven Protein Databank. As with the sequence
|
|
and bibliographic databases, the structure database may be queried
|
|
directly, using specific fields such as author names or text terms,
|
|
to check for structure data on a specific protein or nucleic acid.
|
|
Structure data may then be viewed in 3D, with realtime rotation,
|
|
using the public domain graphics programs RasMol or Kinemage.
|
|
Entrez itself simply writes structure documents in the format
|
|
required by these programs. Future versions, however, will invoke
|
|
an integrated 3D structure viewer directly from Network Entrez.
|
|
The graphical interface is not yet available in WebEntrez.
|
|
<P>
|
|
<B>Daily Updates</B>
|
|
<P>
|
|
Network Entrez and WebEntrez are now updated daily with newly
|
|
released GenBank, EMBL, DDBJ, and GSDB records. New entries can
|
|
be retrieved by searching on any of the Entrez data fields. Sequence
|
|
neighbors, however, lag slightly behind the availability of the
|
|
new records themselves, due to the extensive processing required.
|
|
The sequence neighbors are currently updated weekly. Protein sequence
|
|
entries from SwissProt, PIR, PDB, and PRF are added whenever NCBI
|
|
obtains their public releases. The MEDLINE subset is updated weekly.
|
|
<P>
|
|
<B>No More Registration</B>
|
|
<P>
|
|
It is no longer necessary to register your computer's IP address
|
|
prior to using Network Entrez. However, users at sites that are
|
|
already registered will still see the name of their local administrator
|
|
when they connect. For assistance with network access problems,
|
|
please continue to consult first with your local systems support
|
|
staff. Contact NCBI for bug reports or assistance with using the
|
|
features of Network Entrez.
|
|
<P>
|
|
<B>Links to JBC Online</B>
|
|
<P>
|
|
WebEntrez now contains links to and from JBC Online, the on-line
|
|
version of the <I>Journal of Biological Chemistry</I>, beginning
|
|
with the April 14, 1995, issue. Starting from WebEntrez, select
|
|
the MEDLINE data set, then locate a record that was published
|
|
in JBC. Click on the JBC button to link to JBC Online and see
|
|
the full text of the article.
|
|
<P>
|
|
Starting from JBC Online, links to GenBank are available in articles
|
|
that report a new sequence. When a linked accession number appears
|
|
in an article, the GenBank link is highlighted. Click on the link
|
|
to connect to Entrez and see the full GenBank record. Links are
|
|
also available from many of the references in JBC Online articles.
|
|
Click on any reference that includes a MEDLINE link to connect
|
|
to Entrez and see the MEDLINE abstract.
|
|
<P>
|
|
As other electronic journals become available online, NCBI intends
|
|
to make similar links. JBC Online is available from The Highwire
|
|
Press through its WWW site (http://highwire.stanford.edu/jbc).
|
|
This service is still in the development stage, and access is
|
|
free of charge for a trial period.
|
|
<P>
|
|
<B>CD-ROM Expands to Five Discs</B>
|
|
<P>
|
|
The December 1995 release of Entrez on CD-ROM will require five
|
|
discs. Due to the influx of EST data, the sequence databases are
|
|
growing at a faster rate than was anticipated a year ago. A price
|
|
increase will accompany each expansion to an additional disc.
|
|
<P>
|
|
The NCBI encourages subscribers to switch to either Network Entrez
|
|
or WebEntrez. For more information on Internet versions of Entrez,
|
|
contact info@ncbi.nlm.nih.gov
|
|
<P>
|
|
<A HREF="#TOC">Return to Table of Contents</A><HR>
|
|
|
|
<H3><A NAME="Taxonomy"></A>GenBank Taxonomy: Is a Rabbit a Fish?
|
|
</H3>
|
|
|
|
<P>
|
|
Users are starting to notice the taxonomy changes in GenBank.
|
|
This is the result of a project started 2 years ago to build a
|
|
uniform, phylogenetically based taxonomy for GenBank. Because
|
|
the taxonomy is based on phylogenetics, some of the relationships
|
|
appear unusual at first glance, such as the inclusion of humans
|
|
and rabbits under bony fish.
|
|
<P>
|
|
In the current classification, for example, the Gnathostomata
|
|
(jawed vertebrates) include all vertebrates except the lampreys
|
|
and hagfish; the Osteichthyes (bony vertebrates) include all Gnathostomata
|
|
except the cartilaginous fish; and the Sarcopterygii (lobe-finned
|
|
fishes) include all Osteichthyes except the ray-finned fishes.
|
|
<P>
|
|
<B>Phylogenetic Approach </B>
|
|
<P>
|
|
The impetus for the project was the need for a consistent and
|
|
comprehensive sequence-based taxonomy to process and query the
|
|
sequence databases built at NCBI. With the support of the international
|
|
collaborating databases, EMBL and DDBJ, the GenBank taxonomy was
|
|
developed by merging and unifying taxonomic data from a variety
|
|
of sources. The project is not intended to produce an international
|
|
standard or official classification, but specifically to support
|
|
the sequence databases.
|
|
<P>
|
|
Another important factor was the importance of taxonomic relationships
|
|
to sequence similarities. Because a strictly phylogenetic approach
|
|
more closely reflects evolutionary history than does classical
|
|
taxonomy, it is well suited for applications associated with sequence
|
|
databases. Users, for example, are interested in determining the
|
|
level of specificity of a particular probe or in identifying a
|
|
distantly related organism that has the same gene that they have
|
|
isolated.
|
|
<P>
|
|
Entrez now includes a taxonomy search mode, which can be used
|
|
to explore the GenBank classification and do tree-based retrieval
|
|
of sequence data. Users can retrieve sequences based on scientific
|
|
name and hierarchical classification, then browse upward and downward
|
|
through the phylogenetic tree to retrieve sequences from related
|
|
taxa.
|
|
<P>
|
|
<B>15,500 Species in GenBank</B>
|
|
<P>
|
|
The taxonomic relationships are based on publications wherever
|
|
possible, and literature references are provided so that users
|
|
will be able to independently assess the logic of the GenBank
|
|
classification. In order to be comprehensive, all organisms in
|
|
GenBank must have an entry in the tree. In this regard, the taxonomy
|
|
is driven by the organisms being sequenced rather than all organisms
|
|
that exist. As of August, there were 1,920 family nodes, 5,588
|
|
genus nodes, 15,511 species nodes, and 1,903 nodes below the species
|
|
level represented in GenBank. An average of 10 new organisms are
|
|
added each day.
|
|
<P>
|
|
Three NCBI scientists experienced in taxonomy and molecular biology--Scott
|
|
Federhen, Andrzej Elzanowski, and Detlef Leipe--maintain the taxonomy
|
|
internally. Additionally, outside molecular biologists and taxonomists
|
|
serve as curators and provide expert review and consultation (see
|
|
box). The list of advisors will continue to grow over the next
|
|
few months.
|
|
<P>
|
|
<B>Contributors to Taxonomy Project</B>
|
|
<P>
|
|
Michael Ashburner, European Bionformatics Institute: <I>dipterans</I>
|
|
|
|
<BR>
|
|
Gerhard Baechli, University of Zurich: <I>dipterans</I>
|
|
<BR>
|
|
James G. Baldwin, University of California at Riverside: <I>nematodes</I>
|
|
|
|
<BR>
|
|
Meredith Blackwell, Louisiana State University: <I>fungi</I>
|
|
<BR>
|
|
Bruce Campbell, Agricultural Research Service, USDA: <I>true bugs</I>
|
|
|
|
<BR>
|
|
Russell Chapman, Louisiana State University: <I>green algae</I>
|
|
|
|
<BR>
|
|
Douglas Eernisse, California State University: <I>metazoa</I>
|
|
|
|
<BR>
|
|
Mark Farmer, University of Georgia: <I>euglenoids, kinetoplastids,
|
|
and trichomonods</I>
|
|
<BR>
|
|
Kristian Fauchald, Smithsonian Institution: <I>polychaetes</I>
|
|
|
|
<BR>
|
|
Suzanne Fredericq, Smithsonian Institution: <I>red algae</I>
|
|
<BR>
|
|
Wilson Freshwater, University of Miami: <I>red algae</I>
|
|
<BR>
|
|
Walter Gams, Centraalbureau voor Schimmelcultures (The Netherlands):
|
|
<I>fungi</I>
|
|
<BR>
|
|
Gerald J. Gastony, Indiana University: <I>ferns</I>
|
|
<BR>
|
|
William J. Hahn, Smithsonian Institution: <I>flowering plants</I>
|
|
|
|
<BR>
|
|
William C. Hart, Jr., Smithsonian Institution:<I> decapod crustaceans</I>
|
|
|
|
<BR>
|
|
David Hillis, University of Texas: <I>chordates</I>
|
|
<BR>
|
|
Eugene Koonin, NCBI: <I>viruses</I>
|
|
<BR>
|
|
Phil Lambert, Royal British Columbia Museum: <I>sea cucumbers</I>
|
|
|
|
<BR>
|
|
Jon L. Norenburg, Smithsonian Institution:<I> ribbon worms</I>
|
|
|
|
<BR>
|
|
Richard Olmstead, University of Colorado: <I>dicotyledons</I>
|
|
|
|
<BR>
|
|
Gary Olsen, University of Illinois at Urbana-Champaign: <I>bacteria</I>
|
|
|
|
<BR>
|
|
David Patterson, University of Sydney: <I>stramenopiles</I>
|
|
<BR>
|
|
Norman Pieniazek, Centers for Disease Control and Prevention:
|
|
<I>microsporidians</I>
|
|
<BR>
|
|
Norman Platnick, American Museum of Natural History: <I>spiders</I>
|
|
|
|
<BR>
|
|
Jerry Powell, University of California at Berkeley: <I>moths</I>
|
|
|
|
<BR>
|
|
Harry M. Savage, Centers for Disease Control and Prevention: <I>mosquitos</I>
|
|
|
|
<BR>
|
|
Jeffrey Jon Shaw, Belem Research Project (Brazil): <I>leishmanias</I>
|
|
|
|
<BR>
|
|
David Sissom, West Texas A&M University: <I>scorpions</I>
|
|
|
|
<BR>
|
|
Alan R. Smith, University of California at Berkeley: <I>ferns</I>
|
|
|
|
<BR>
|
|
Mitchell Sogin, Marine Biological Laboratory at Woods Hole: <I>protists</I>
|
|
|
|
<BR>
|
|
Felix Sperling, University of California at Berkeley: <I>butterflies</I>
|
|
|
|
<BR>
|
|
John Taylor, University of California at Berkeley: <I>fungi</I>
|
|
|
|
<BR>
|
|
Robert Van Syoc, California Academy of Sciences: <I>cirripeds</I>
|
|
|
|
<BR>
|
|
Steven J. Wagstaff, Landcare Research New Zealand Ltd.: <I>dicotyledons</I>
|
|
|
|
<BR>
|
|
George R. Zug, Smithsonian Institution: <I>reptiles</I>
|
|
<P>
|
|
<A HREF="#TOC">Return to Table of Contents</A><HR>
|
|
|
|
<H3><A NAME="BankIt"></A>BankIt Submissions Mount</H3>
|
|
|
|
<P>
|
|
BankIt, the World Wide Web (WWW) tool for submitting sequences
|
|
to GenBank, has been used to submit more than 7,000 GenBank entries
|
|
and now accounts for more than two-thirds of new submissions each
|
|
month. Since its introduction this past February, a number of
|
|
improvements have been made to meet the needs expressed by our
|
|
users.
|
|
<P>
|
|
<B>More Than 30,000 Bases</B>
|
|
<P>
|
|
BankIt now accepts sequences longer than 30,000 nucleotides. Although
|
|
most, if not all, WWW browsers still have an inherent limitation
|
|
of approximately 30,000 characters per input window, BankIt circumvents
|
|
this by first asking how many nucleotides you intend to submit.
|
|
The appropriate number of DNA sequence input windows, each with
|
|
a 30KB capacity, is then incorporated into the BankIt submission
|
|
form.
|
|
<P>
|
|
Note that there is still an upper limit of 350,000 nucleotides
|
|
for individual GenBank records, as agreed by the international
|
|
collaboration of DNA sequence databases (see <A
|
|
HREF="#Megabase">"GenBank Enters the Megabase Sequence Era"</A>).
|
|
Sequences larger than 350,000 should be broken down into smaller
|
|
segments and submitted as separate entries that will be linked
|
|
together by software.
|
|
<P>
|
|
<B>Your BankIt ID Number</B>
|
|
<P>
|
|
If your Web client crashes or you forgot to save a copy of your
|
|
submission file, all is not lost. NCBI maintains a BankIt transaction
|
|
log and assigns an identification number to each BankIt submission.
|
|
So if you ever need to retrieve an incomplete submission, just
|
|
tell us your BankIt ID number. We will e-mail the submission to
|
|
you in HTML format, then you can reload it into BankIt and complete
|
|
it. Note that your BankIt ID is not your GenBank accession number.
|
|
<P>
|
|
<B>Updating Existing Entries</B>
|
|
<P>
|
|
BankIt can now be used to modify or update any of your own GenBank
|
|
records, regardless of whether BankIt was used to submit them
|
|
originally. Choose Update from the BankIt opening screen, then
|
|
enter your accession number. If the record is in the public release
|
|
of GenBank, BankIt will display it. If your record is being held
|
|
confidential, it will not be displayed, but you can still specify
|
|
the modifications to be made. If you wish to make modifications
|
|
to a very recent submission for which you do not yet have an accession
|
|
number, you can use your BankIt ID number instead.
|
|
<P>
|
|
<B>Saving a BankIt File</B>
|
|
<P>
|
|
A new button called Save This Form has been added at the bottom
|
|
of the BankIt submission form. Saving a copy of your submission
|
|
is useful if you have several similar sequences to submit, or
|
|
if you want to save an incomplete submission and come back to
|
|
it later. When you have completed each BankIt submission, we recommend
|
|
that you save a copy in HTML format for your records.
|
|
<P>
|
|
To save a submission form, press the Save This Form button, then
|
|
click on BankIt. Netscape and MacWeb users will be prompted by
|
|
their browser to enter a file name, then the file will be saved
|
|
automatically in HTML format on their local system. MacWeb users
|
|
need to include .html as the filename extension. Saving is not
|
|
completely automatic for Unix-based Mosaic users. They need to
|
|
press Save This Form, then use Mosaic's "Save As" feature
|
|
to name and save the file in HTML format. When the saving is completed,
|
|
all users should turn the Save This Form button off by clicking
|
|
on it again, then pressing the BankIt button to continue.
|
|
<P>
|
|
<B>Submission Tips</B>
|
|
<P>
|
|
<I><B>Change Status to Submit! </B></I>
|
|
<P>
|
|
The most important tip is to change the status of your submission
|
|
from Modify to Submit before clicking the BankIt button for the
|
|
last time. Otherwise, your submission is still incomplete. You
|
|
will know that your submission is complete when the BankIt window
|
|
displays a thank you message. You will also receive an e-mail
|
|
acknowledgment thanking you for your submission.
|
|
<P>
|
|
<I><B>Don't Forget the Annotations! </B></I>
|
|
<P>
|
|
The initial BankIt form does not provide for biological annotations.
|
|
However, once you enter the initial information and click the
|
|
BankIt button, you will have an opportunity to review your entry
|
|
and specify the number of coding regions, structural RNA features,
|
|
or other biological features you wish to add. Click the BankIt
|
|
button again, and you'll be able to enter the additional information
|
|
at the end of your original BankIt submission form.
|
|
<P>
|
|
<B>Help</B>
|
|
<P>
|
|
If you have any questions on using BankIt, contact GenBank User
|
|
Services at info@ncbi.nlm.nih.gov or at (301) 496-2475.
|
|
<P>
|
|
<A HREF="#TOC">Return to Table of Contents</A><HR>
|
|
|
|
<H3><A NAME="NCBIFTP"></A>Frequently Asked Questions</H3>
|
|
|
|
<P>
|
|
<I>I recently used BankIt to submit a sequence to GenBank, but
|
|
I haven't received any confirmation. My BankIt number was 12345.
|
|
Should I do anything else?</I>
|
|
<P>
|
|
Contact GenBank User Support, and they will check the BankIt transaction
|
|
log to confirm that a completed submission was received. Incomplete
|
|
submissions can be retrieved and e-mailed to you for completion.
|
|
On the BankIt Revision Page, note that if you do not explicitly
|
|
click on the Submit to GenBank option before pressing the BankIt
|
|
button for the last time, your submission remains incomplete.
|
|
<P>
|
|
<I>A previously unmapped EST maps to the region I'm working in,
|
|
assuming no duplications. Should I send this information to dbEST?
|
|
</I>
|
|
<P>
|
|
Yes, dbEST does accept mapping data for EST sequences submitted
|
|
by someone else. Basically, you will submit four small files,
|
|
one for your contact information, one for the mapping method,
|
|
one for a citation to the mapping method, and one for the map
|
|
data itself. NCBI does have special formatting requirements for
|
|
EST data, so obtain detailed instructions from info@ncbi.nlm.nih.gov
|
|
<P>
|
|
<I>When using the RETRIEVE e-mail server, how can I get around
|
|
the 1,000- and 50,000-line limits on my output?</I>
|
|
<P>
|
|
The MAXLINES command allows you to change the default limit of
|
|
1,000 lines to any number up to the maximum of 50,000. The STARTDOC
|
|
command allows you to obtain output in several batches. This can
|
|
be used to circumvent the 50,000-line maximum as well as any limits
|
|
in the size of mail messages you are able to receive at your end.
|
|
<P>
|
|
<I>With the BLAST e-mail server, is there any way to be sure you
|
|
got my search and to find out how long the queue is?</I>
|
|
<P>
|
|
The new ACKNOWLEDGE command allows you to receive a notice after
|
|
any length of time you specify. If your search is not completed
|
|
within that time period, the BLAST server will send an e-mail
|
|
message informing you of your position in the processing queue.
|
|
See the BLAST documentation for more detail.
|
|
<P>
|
|
<I>How can I find out the total number of entries and nucleotides
|
|
in GenBank?</I>
|
|
<P>
|
|
These numbers are published at the beginning of the GenBank Release
|
|
Notes prepared for each release and are available from NCBI's
|
|
Anonymous FTP site (ncbi.nlm.nih.gov) in the directory "genbank".
|
|
The name of the file is gbrel.txt.
|
|
<P>
|
|
<I>Is the full-sequence of </I>Haemophilus influenzae<I> available?</I>
|
|
|
|
<P>
|
|
Yes, the full contiguous sequence is available on the NCBI FTP
|
|
site in the genbank/genomes directory. The sequence is presented
|
|
in both FASTA format (1.8MB) and GenBank flat-file format (3.8MB)
|
|
and as compressed and uncompressed files.
|
|
<P>
|
|
<I>Authorin doesn't work with my new Mac. What's wrong? </I>
|
|
<P>
|
|
Authorin only runs with 24-bit addressing. If your Mac allows
|
|
you to select 24-bit or 32-bit mode, select 24-bit mode, then
|
|
restart before using Authorin. If you have a newer Mac that only
|
|
uses 32-bit addressing, you'll need to switch to BankIt on the
|
|
Word Wide Web.
|
|
<P>
|
|
<A HREF="#TOC">Return to Table of Contents</A><HR>
|
|
|
|
<H3>NCBI Data by FTP </H3>
|
|
|
|
<P>
|
|
The NCBI FTP site contains a variety of directories with publicly
|
|
available databases and software. The available directories include
|
|
"repository", "genbank", "entrez",
|
|
"toolbox", and "pub".
|
|
<P>
|
|
The <B>repository</B> directory makes a number of molecular biology
|
|
databases available to the scientific community. This directory
|
|
includes databases such as PIR, SwissProt, CarbBank, AceDB, and
|
|
FlyBase.
|
|
<P>
|
|
The <B>genbank</B> directory contains files with the latest full
|
|
release of Genbank, the daily cumulative updates, and the latest
|
|
release notes.
|
|
<P>
|
|
The <B>entrez</B> directory contains the Entrez executable programs
|
|
for accessing CD-ROM data on a variety of platforms. It also contains
|
|
client software for Network Entrez.
|
|
<P>
|
|
The <B>toolbox</B> directory contains a set of software and data
|
|
exchange specifications that are used by NCBI to produce portable
|
|
software, and includes ASN.1 tools and specifications for molecular
|
|
sequence data.
|
|
<P>
|
|
The <B>pub</B> directory offers public domain software, such as
|
|
BLAST (sequence similarity search program), MACAW (multiple sequence
|
|
alignment program), and Authorin submission software for Mac and
|
|
PC systems. Client software for Network BLAST is also included
|
|
in this directory.
|
|
<P>
|
|
Data in these directories can be transferred through the Internet
|
|
by using the Anonymous FTP program. To connect, type: <B>ftp
|
|
ncbi.nlm.nih.gov</B>
|
|
or <B>ftp 130.14.25.1.</B> Enter <B>anonymous</B> for the login
|
|
name, and enter your e-mail address as the password. Then change
|
|
to the appropriate directory. For example, change to the repository
|
|
directory (cd repository) to download specialized databases.
|
|
<P>
|
|
<A HREF="#TOC">Return to Table of Contents</A><HR>
|
|
|
|
<H3><A NAME="RecentPubs"></A>Selected Recent Publications by NCBI
|
|
Staff</H3>
|
|
|
|
<P>
|
|
<B>Baxevanis, AD, SH Bryant</B>, and <B>D Landsman</B>. Homology
|
|
model building of the HMG1 box structural domain. <I>Nucleic Acids
|
|
Res</I> 23(6):1019-29, 1995.
|
|
<P>
|
|
<B>Baxevanis, AD</B>, and <B>D Landsman</B>. The HMG1 box protein
|
|
family: classification and functional relationships. <I>Nucleic
|
|
Acids Res</I> 23(9):1604-13, 1995.
|
|
<P>
|
|
<B>Boguski, MS</B>. Molecular medicine: hunting for genes in computer
|
|
data bases. <I>N Engl J Med</I> 333(10):645-47, 1995.
|
|
<P>
|
|
<B>Boguski, MS</B>, and <B>GD Schuler</B>. ESTablishing a human
|
|
transcript map. <I>Nat Genet</I> 10:369-71, 1995.
|
|
<P>
|
|
<B>Bryant, SH</B>, and <B>SF Altschul</B>. Statistics of sequence-structure
|
|
threading. <I>Curr Opin Struct Biol</I> 5:236-44, 1995.
|
|
<P>
|
|
Bussey, H, DB Kaback, W Zhong, DT Vo, MW Clark, N Fortin, <B>BFF
|
|
Ouellette</B>, R Keng, AB Barton, Y Su, CK Davies, and RK Storms.
|
|
The sequence of chromosome I from Saccharomyces cerevisiae. <I>Proc
|
|
Natl Acad Sci USA </I>92:3809-13, 1995.
|
|
<P>
|
|
Castonguary, LA, <B>SH Bryant</B>, PW Snow, and JS Fetrow. A proposed
|
|
structural model of domain 1 of faciclin II neural cell adhesion
|
|
protein based on an inverse folding algorithm. <I>Protein Sci</I>
|
|
4:472-83, 1995.
|
|
<P>
|
|
Klaassen, VA, M Boeshore, <B>EV Koonin</B>, T Tian, and BW Falk.
|
|
Genome structure and phylogenetic analysis of lettuce infectious
|
|
yellows virus, a whitefly-transmitted, bipartite closterovirus.
|
|
<I>Virology</I> 208:99-110, 1995.
|
|
<P>
|
|
<B>Landsman, D</B>, and AP Wolffe. Common sequence and structural
|
|
features in the heat shock factor and ets families of DNA-binding
|
|
domains. <I>Trends Biochem Sci</I> 20(6):225-6, 1995.
|
|
<P>
|
|
Rouviere PE, A De Las Penas, J Mescas, CZ Lu, <B>KE Rudd</B>,
|
|
and CA Gross. rpoE, the gene encoding the second heatshock sigma
|
|
factor, sigmaE, in Escherichia coli. <I>EMBO J</I> 14:1032-42,
|
|
1995.
|
|
<P>
|
|
Sanderson KE, A Hessel, and <B>KE Rudd</B>. Genetic map of Salmonella
|
|
typhimurium, edition VIII. <I>Microbiol Rev</I> 59:241-303, 1995.
|
|
<P>
|
|
<A HREF="#TOC">Return to Table of Contents</A><HR>
|
|
|
|
<H2><A NAME="FAQ"></A><A NAME="GenBank"></A>GenBank: Easy Deposits,
|
|
Unlimited Withdrawals, High Interest</H2>
|
|
|
|
<P>
|
|
It's easy--and free--to contribute sequences to GenBank and search
|
|
the database. This table summarizes the data submission and search
|
|
services available from NCBI.
|
|
<P>
|
|
<B>Information</B>
|
|
<P>
|
|
<I>Purpose:</I> Obtain general information about NCBI databases
|
|
and services.<BR>
|
|
<I>How To Use/How To Get Help:</I> Send e-mail to info@ncbi.nlm.nih.gov
|
|
or call GenBank User Services at (301) 496-2475.
|
|
<P>
|
|
<B>GenBank submissions</B>
|
|
<P>
|
|
<I>Purpose:</I> Submit new sequences to GenBank.<BR>
|
|
<I>How To Use/How To Get Help:</I> For information or technical
|
|
assistance: info@ncbi.nlm.nih.gov
|
|
<P>
|
|
<I>Service:</I> Authorin software<BR>
|
|
<I>Purpose:</I> Prepare new or updated GenBank entry. <BR>
|
|
<I>How To Use/How To Get Help:</I> Send a new submission by e-mail:
|
|
gbsub@ncbi.nlm.nih.gov<BR>
|
|
To obtain software for Mac or PC, send request to: authorin@ncbi.nlm.nih.gov
|
|
<P>
|
|
<I>Service:</I> BankIt on WWW<BR>
|
|
<I>Purpose:</I> Prepare and submit new GenBank entry over the
|
|
Internet, using the World Wide Web.<BR>
|
|
<I>How To Use/How To Get Help:</I> For information on compatible
|
|
WWW browsers: info@ncbi.nlm.nih.gov<BR>
|
|
To access BankIt through NCBI Home Page:http://www.ncbi.nlm.nih.gov/
|
|
<P>
|
|
<B>GenBank updates</B>
|
|
<P>
|
|
<I>Purpose:</I> Correct or update an existing sequence; request
|
|
release of published data. <BR>
|
|
<I>How To Use/How To Get Help:</I> Send an update request by e-mail:
|
|
update@ncbi.nlm.nih.gov
|
|
<P>
|
|
<B>E-mail servers</B>
|
|
<P>
|
|
<I>Service:</I> retrieve@ncbi.nlm.nih.gov<BR>
|
|
<I>Purpose:</I> Retrieve GenBank and other sequence database records
|
|
from an e-mail server based on any text term, including accession
|
|
number, author name, locus, gene name, etc.<BR>
|
|
<I>How To Use/How To Get Help:</I> To receive documentation, send
|
|
a message containing only the word HELP to the server address.
|
|
For personal assistance, send questions to: retrieve-help@ncbi.nlm.nih.gov
|
|
<P>
|
|
<I>Service:</I> blast@ncbi.nlm.nih.gov<BR>
|
|
<I>Purpose:</I> Perform a sequence similarity search of GenBank
|
|
and other sequence databases using the BLAST algorithm. <BR>
|
|
<I>How To Use/How To Get Help:</I> To receive documentation, send
|
|
a message containing only the word HELP to the server address.
|
|
For personal assistance, send questions to: blast-help@ncbi.nlm.nih.gov
|
|
<P>
|
|
<B>Internet applications</B>
|
|
<P>
|
|
<I>Purpose:</I> "Client-server" programs, in which client
|
|
program on local PC, Mac, or Unix workstation queries NCBI server
|
|
via the network.<BR>
|
|
<I>How To Use/How To Get Help:</I> All NCBI network applications
|
|
require Internet access and locally installed TCP/IP software.
|
|
<P>
|
|
<I>Service:</I> Network Entrez<BR>
|
|
<I>Purpose:</I> Point-and-click retrieval system for PCs running
|
|
Windows, Macs, and Unix workstations. Provides text-based searching
|
|
of sequence databases and a sequence-related subset of MEDLINE.
|
|
<BR>
|
|
<I>How To Use/How To Get Help:</I> To obtain client software,
|
|
send e-mail to: net-info@ncbi.nlm.nih.gov
|
|
<P>
|
|
<I>Service:</I> Network BLAST<BR>
|
|
<I>Purpose:</I> BLAST client for similarity searching for PC (DOS),
|
|
Mac, Unix, and VMS workstations.<BR>
|
|
<I>How To Use/How To Get Help:</I> To register and obtain client
|
|
software, send e-mail to: blast-help@ncbi.nlm.nih.gov
|
|
<P>
|
|
<I>Service:</I> World Wide Web access<BR>
|
|
<I>Purpose:</I> WWW access to NCBI databases and search services,
|
|
including BankIt for GenBank submissions and Web versions of RETRIEVE,
|
|
BLAST, and Entrez.<BR>
|
|
<I>How To Use/How To Get Help:</I> For information on compatible
|
|
WWW browsers: info@ncbi.nlm.nih.gov<BR>
|
|
To access NCBI Home Page: http://www.ncbi.nlm.nih.gov/
|
|
<P>
|
|
<B>Anonymous FTP: ncbi.nlm.nih.gov</B>
|
|
<P>
|
|
<I>Purpose:</I> Obtain GenBank releases, NCBI software, and various
|
|
molecular biology databases.<BR>
|
|
<I>How To Use/How To Get Help:</I> Login as "anonymous"
|
|
(unquoted) and enter your e-mail address as your password.
|
|
<P>
|
|
<B>CD-ROMs</B>
|
|
<P>
|
|
<I>Purpose:</I> For users who do not have Internet access
|
|
or who prefer a local copy of databases.<BR>
|
|
<I>How To Use/How To Get Help:</I> For information about subscriptions,
|
|
send e-mail to: info@ncbi.nlm.nih.gov<BR>
|
|
|
|
<P>
|
|
<I>Service:</I> Entrez (GPO list ID: ENT)<BR>
|
|
<I>Purpose:</I> CD-ROM version of Network Entrez. Annual subscription
|
|
(6 issues per year). <BR>
|
|
<I>How To Use/How To Get Help:</I> For technical assistance, send
|
|
e-mail questions to: entrez@ncbi.nlm.nih.gov
|
|
<P>
|
|
<I>Service:</I> GenBank (GPO list ID: NCBIF)<BR>
|
|
<I>Purpose:</I> GenBank in "flat-file" format, as used
|
|
by some commercial and academic software. Annual subscription
|
|
(6 issues per year). <BR>
|
|
<I>How To Use/How To Get Help:</I> Send e-mail to: info@ncbi.nlm.nih.gov
|
|
<P>
|
|
<A HREF="#TOC">Return to Table of Contents</A><HR>
|
|
|
|
<P>
|
|
<IMG SRC="newslogo.gif" ALIGN="BOTTOM">
|
|
<H4><A NAME="Masthead"></A>Masthead</H4>
|
|
|
|
<P>
|
|
<I>NCBI News</I> is distributed three times a year. We welcome
|
|
communication from users of NCBI databases and software and invite
|
|
suggestions for articles in future issues. Send correspondence
|
|
and suggestions to <I>NCBI News</I> at the address below.
|
|
<P>
|
|
NCBI News<BR>
|
|
National Library of Medicine<BR>
|
|
Bldg. 38A, Room 8N803<BR>
|
|
8600 Rockville Pike<BR>
|
|
Bethesda, MD 20894<BR>
|
|
Phone: (301) 4962475<BR>
|
|
Fax: (301) 4809241<BR>
|
|
E-mail: info@ncbi.nlm.nih.gov
|
|
<P>
|
|
<I>Editors</I>
|
|
<P>
|
|
Dennis Benson<BR>
|
|
Barbara Rapp
|
|
<P>
|
|
<I>Design Consultant</I>
|
|
<P>
|
|
Troy M. Hill
|
|
<P>
|
|
<I>Photography</I>
|
|
<P>
|
|
Karlton Jackson
|
|
<P>
|
|
<I>Editing, Graphics, and Production</I>
|
|
<P>
|
|
Veronica Johnson<BR>
|
|
Wendy B. Osborne
|
|
<P>
|
|
In 1988, Congress established the National Center for Biotechnology
|
|
Information as part of the National Library of Medicine; its charge
|
|
is to create automated systems for storing molecular biology,
|
|
biochemistry, and genetics data, and to perform research in computational
|
|
molecular biology.
|
|
<P>
|
|
The contents of this newsletter may be reprinted without permission.
|
|
The mention of trade names, commercial products, or organizations
|
|
does not imply endorsement by NCBI, NIH, or the U.S. Government.
|
|
<P>
|
|
NIH Publication No. 95-3272
|
|
<P>
|
|
ISSN 1060-8788
|
|
<P>
|
|
<A HREF="#TOC">Return to Table of Contents</A>
|
|
|
|
<hr>
|
|
|
|
<A HREF="/"><img src="/Gifs/ncbi_button.gif"
|
|
alt="NCBI Home"></A>
|
|
|
|
<A HREF="index.html"><img src="/Gifs/newsletter_button.gif"
|
|
alt="Newsletter Home"></A>
|
|
|
|
|
|
</BODY>
|
|
</HTML>
|
|
|
|
|