1 line
No EOL
20 KiB
Text
1 line
No EOL
20 KiB
Text
{"id":8683,"date":"2016-03-03T11:00:48","date_gmt":"2016-03-03T16:00:48","guid":{"rendered":"http:\/\/circulatingnow.nlm.nih.gov\/?p=8683"},"modified":"2018-07-18T16:05:18","modified_gmt":"2018-07-18T20:05:18","slug":"genbank-the-early-years-of-big-data","status":"publish","type":"post","link":"https:\/\/circulatingnow.nlm.nih.gov\/2016\/03\/03\/genbank-the-early-years-of-big-data\/","title":{"rendered":"GenBank & The Early Years of \u201cBig Data\u201d"},"content":{"rendered":"<p><em>In cooperation with our colleagues at the <a href=\"http:\/\/www.ncbi.nlm.nih.gov\/\">National Center for Biotechnology Information (NCBI)<\/a>, National Library of Medicine (NLM), the NLM\u2019s History of Medicine Division recently acquired the archives of the early history of <a href=\"http:\/\/www.ncbi.nlm.nih.gov\/genbank\/\">GenBank<\/a>, the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. Today <\/em>Circulating Now<em> welcomes guest blogger <a href=\"http:\/\/biologie.unige.ch\/en\/the-section\/the-networks\/science-education\/\" target=\"_blank\" rel=\"noopener\">Bruno J. Strasser<\/a>. Dr. Strasser is a professor at the University of Geneva, Switzerland, an adjunct professor at Yale University, and author of the book <\/em>Collecting Experiments: The New Production of Biomedical Knowledge<em>, forthcoming from University of Chicago Press.<\/em><\/p>\n<p>\u201cAlmost the number of stars in the Milky Way.\u201d Through this stellar comparison, the National Institutes of Health proudly announced in 2005 that the content of their computerized collection of DNA sequences called GenBank had reached 50 billion bases or units of DNA. Today, it contains far more, over 200 billion bases from over 350,000 different species, making it one of the largest scientific database in the world.<\/p>\n<figure id=\"attachment_8686\" aria-describedby=\"caption-attachment-8686\" style=\"width: 222px\" class=\"wp-caption alignleft\"><a href=\"https:\/\/i0.wp.com\/circulatingnow.nlm.nih.gov\/wp-content\/uploads\/2016\/02\/3360_001_crop.jpg?ssl=1\" rel=\"attachment wp-att-8686\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"8686\" data-permalink=\"https:\/\/circulatingnow.nlm.nih.gov\/2016\/03\/03\/genbank-the-early-years-of-big-data\/3360_001_crop\/\" data-orig-file=\"https:\/\/i0.wp.com\/circulatingnow.nlm.nih.gov\/wp-content\/uploads\/2016\/02\/3360_001_crop.jpg?fit=929%2C1041&ssl=1\" data-orig-size=\"929,1041\" data-comments-opened=\"1\" data-image-meta=\"{"aperture":"0","credit":"","camera":"","caption":"","created_timestamp":"0","copyright":"","focal_length":"0","iso":"0","shutter_speed":"0","title":"","orientation":"1"}\" data-image-title=\"1980s GenBank Instructions\" data-image-description=\"<p>To connect to GenBank on-line:<br \/>\nCall the local Telenet node.*<br \/>\nType what is shown in blue.<br \/>\n&lt;Return\/ (for 1200 BAUD)<br \/>\n@ (for 2400 BAUD)<br \/>\nTELENET<br \/>\nxxx xxx<br \/>\nTERMINAL=<br \/>\n@ C Genbank,genbank<br \/>\nPASSWORD = 4NIGMS<br \/>\nGENBANK CONNECTED<\/p>\n<p>IntelliGenetics IG-20 Tops-20….<br \/>\n@Genbank <\/p>\n<p>*If you don’t know your local node, call (800) 336-0437.<\/p>\n\" data-image-caption=\"<p>Detail from a GenBank brochure, ca. 1985<\/p>\n\" data-medium-file=\"https:\/\/i0.wp.com\/circulatingnow.nlm.nih.gov\/wp-content\/uploads\/2016\/02\/3360_001_crop.jpg?fit=268%2C300&ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/circulatingnow.nlm.nih.gov\/wp-content\/uploads\/2016\/02\/3360_001_crop.jpg?fit=840%2C941&ssl=1\" class=\"wp-image-8686 \" src=\"https:\/\/i0.wp.com\/circulatingnow.nlm.nih.gov\/wp-content\/uploads\/2016\/02\/3360_001_crop.jpg?resize=222%2C249&ssl=1\" alt=\"Instructions for connecting to GenBank via Telenet.\" width=\"222\" height=\"249\" \/><\/a><figcaption id=\"caption-attachment-8686\" class=\"wp-caption-text\">Detail from a GenBank brochure, ca. 1985<br \/><em>Courtesy National Library of Medicine Acc. 2015-045<\/em><\/figcaption><\/figure>\n<p>The creation of GenBank, like that of the heavens, was no small achievement. This archival collection of hand-written, type-written, and printed documents deposited at the NLM reveals the first discussions among scientists and science administrators about this new infrastructure, created in 1982, and the first decade of its existence. These papers offer a unique window onto the coming of age of \u201c<a href=\"https:\/\/datascience.nih.gov\/bd2k\/about\/what\">big data<\/a>,\u201d of how it is transforming scientific research, and how it led to the \u201c<a href=\"https:\/\/www.whitehouse.gov\/blog\/2013\/02\/22\/expanding-public-access-results-federally-funded-research\">open access<\/a>\u201d movement. Today, as \u201cbig data\u201d is heralded as the \u201cnew oil\u201d and as our daily online actions are increasingly stored in databases for marketing and other purposes, it is useful to begin reflecting on the history of our information age.<\/p>\n<p>In the sciences, the challenge of \u201cbig data\u201d arose particularly early and has transformed the way scientific research is done. GenBank has become an indispensable tool for biomedical researchers around the world. This encyclopedia of gene sequences is now a truly collaborative and worldwide effort. It includes the complete genomes of over 3,000 organisms, from humans to zebrafish, from rice to bacteria like E. coli.<\/p>\n<p><iframe loading=\"lazy\" title=\"Bruno J. Strasser speaking at the Genbank 25th Anniversary\" width=\"840\" height=\"473\" src=\"https:\/\/www.youtube.com\/embed\/VRnY5HP3wjM?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen><\/iframe><\/p>\n<p>Biomedical researchers go to GenBank to find the sequence for a given gene and associated annotation, such as the organism from which the sequence was derived, biological functions, and scientific journal articles. More importantly, researchers search the database to find if it contains a sequence that closely resembles one they have determined in their laboratory from a specific organism. Often they do find a match, and the similarity tells them that both sequences, in different organisms, probably have a similar function, since they evolved though the same common ancestor. This comparative approach is key to the success of contemporary biomedical research.<\/p>\n<figure id=\"attachment_8687\" aria-describedby=\"caption-attachment-8687\" style=\"width: 234px\" class=\"wp-caption alignleft\"><a href=\"https:\/\/i0.wp.com\/circulatingnow.nlm.nih.gov\/wp-content\/uploads\/2016\/02\/3369_001.png?ssl=1\" rel=\"attachment wp-att-8687\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"8687\" data-permalink=\"https:\/\/circulatingnow.nlm.nih.gov\/2016\/03\/03\/genbank-the-early-years-of-big-data\/3369_001\/\" data-orig-file=\"https:\/\/i0.wp.com\/circulatingnow.nlm.nih.gov\/wp-content\/uploads\/2016\/02\/3369_001.png?fit=934%2C1200&ssl=1\" data-orig-size=\"934,1200\" data-comments-opened=\"1\" data-image-meta=\"{"aperture":"0","credit":"","camera":"","caption":"","created_timestamp":"0","copyright":"","focal_length":"0","iso":"0","shutter_speed":"0","title":"","orientation":"0"}\" data-image-title=\"NCBI News, 1992\" data-image-description=\"<p>[Title] NCBI News<\/p>\n<p>Volume I, Issue 3<br \/>\nNATIONAL CENTER for BIOTECHNOLOGY INFORMATION<\/p>\n<p>National Library of Medicine<br \/>\nNational Institutes of Health<br \/>\n8600 Rockville Pike<br \/>\nBethesda. MD 20894<br \/>\n(301) 496-2475<br \/>\nSeptember 1992<\/p>\n<p>[Article]<br \/>\nNCBI Continues GenBank Services<\/p>\n<p>Effective October 1, GenBank<br \/>\nmoves to the National Center<br \/>\nfor Biotechnology lnfonnation<br \/>\n(NCBI), at the National Library of<br \/>\nMedicine. NCBI will strive to continue the high degree of responsiveness<br \/>\nto the user community that<br \/>\nlntelliGenetics (IG) has demonstrated through the past five years<br \/>\nand to maintain continuity with<br \/>\nservices that have been provided by<br \/>\nIG.<br \/>\nThe standard flat-file format for<br \/>\nGenBank data distribution will continue. Similarly, the e-mail sequence<br \/>\nsearch and record retrieval servers and anonymous FTP access to the<br \/>\ndatabase will be provided by NCBI. Free lRX searching wiU not be provided.<br \/>\nbut JG will suppon it by subscription to the IG Timesharing Service.<br \/>\nDirect data submission will continue to be processed by Los Alamos<br \/>\nNational Laboratory, so authors should continue to send data and<br \/>\nAuthorin submissions to Los Alamos. NCBI will be enhancing coverage<br \/>\nof the database by scanning from a wider range of journals, adding<br \/>\nprotein sequence data and MEDLINE citations, and releasing at twomonth<br \/>\nintervals. The major distribution modes are CD-ROM and FfP,<br \/>\nalthough magnetic tape distribution is available in VMS Backup fonnat<br \/>\nby special order.<br \/>\nCD-ROM Distribution<br \/>\nThe NCBI CD-ROMs are available by subscription through the<br \/>\nGovernment Printing Office (see order fonn on page 3). The year’s subscription<br \/>\nbegins October IS, 1992. and includes a release every two<br \/>\nmonths. Three different titles (all in ISO 9660 format) are now available:<br \/>\n\u2022 Entrez: Sequences – integrated sequence data from GenBank,<br \/>\nEMBL, DDBJ, SWISS-PROT, and PIR linked to MEDLINE citations.<br \/>\nText retrieval software for Maci ntosh and PC-compatible sys\u00b7<br \/>\ntems running Windows 3.1 is included. but there is no sequence<br \/>\nsimilarity search software.<\/p>\n<p>[IMAGE CAPTION]<br \/>\nDr. Jim Fleshman (right) demonstrates<br \/>\nEntrez: Sequences at the American Society<br \/>\nfor Microbiology Meetmg in New Otleans.<\/p>\n<p>[Article]<br \/>\nNCBI on Exhibit<br \/>\nNCBI will be exhibiting the<br \/>\nEmrez: Sequences CD-ROM<br \/>\nand providing infonnation on its<br \/>\ndatabase services at the American<br \/>\nSociety for Cell Biology meeting<br \/>\nin Denver, November 16-19,<br \/>\n1992. Come to Booth 547.<\/p>\n<p>[Table of Contents]<br \/>\nIN THIS ISSUE<br \/>\nGenBank Services ………………………. I<br \/>\nNCBI on Exhibit …………………… …… I<br \/>\nCO-ROM Order Form …………………. 3<br \/>\nDatabase Repository CO-ROM ……. 4<br \/>\nFrequently Asked Questions ………… 5<br \/>\ndb EST Database …… ……………………. 6<br \/>\nNe”‘ in Print ………… …………………. … 6<\/p>\n\" data-image-caption=\"<p>NCBI News (Volume 1, Issue 3) September 1992, featuring news about the move of GenBank to the National Center for Biotechnology Information of the National Library of Medicine. Courtesy National Library of Medicine. Acc. 2015-045.<\/p>\n\" data-medium-file=\"https:\/\/i0.wp.com\/circulatingnow.nlm.nih.gov\/wp-content\/uploads\/2016\/02\/3369_001.png?fit=234%2C300&ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/circulatingnow.nlm.nih.gov\/wp-content\/uploads\/2016\/02\/3369_001.png?fit=797%2C1024&ssl=1\" class=\"wp-image-8687 size-medium\" src=\"https:\/\/i0.wp.com\/circulatingnow.nlm.nih.gov\/wp-content\/uploads\/2016\/02\/3369_001.png?resize=234%2C300&ssl=1\" alt=\"The front page of a printed newsletter.\" width=\"234\" height=\"300\" \/><\/a><figcaption id=\"caption-attachment-8687\" class=\"wp-caption-text\"><em>NCBI News<\/em> (Volume 1, Issue 3) September 1992, featuring news about the move of GenBank to the National Center for Biotechnology Information at the National Library of Medicine.<br \/> <em>Courtesy National Library of Medicine Acc. 2015-045<\/em><\/figcaption><\/figure>\n<p>But in the late 1970s, when this collection of data was first envisioned by scientists, it was far from clear that it would be worth the effort. First, there wasn\u2019t that much data to be collected, and many biomedical researchers were still uncomfortable with computers. To some experimentalists, a data collection even sounded somewhat antiquated, like a natural history museum or a library collection, not like one of the cutting-edge instruments enabling experimental virtuosity. And some doubted that it was the NIH\u2019s mission to fund such an infrastructure. But Dr. Elke Jordan, deputy director of the Genetics Program Branch at the National Institute of General Medical Sciences (NIGMS), and Ruth L. Kirschstein, director of NIGMS, with the help of a few key scientists, like Dr. Richard J. Roberts, eventually succeeded in opening a request for proposals and, finally, signing a contract with Los Alamos National Laboratory to host GenBank. Ten years later, the operation of the GenBank database was transferred from Los Alamos National Laboratory to the <a href=\"http:\/\/www.ncbi.nlm.nih.gov\/\">National Center for Biotechnology Information<\/a>, where it is maintained today, regularly accessed by scientists around the world, who also contribute measurably to its growth by depositing their own data. These contributions are a fundamental part of the current era of \u201cbig data\u201d that continues to inform scientific discovery.<\/p>\n<p>Thanks to the staff of the NLM\u2019s History of Medicine Division and NCBI, the archives of GenBank are now preserved and publicly available for use at the NLM in the History of Medicine Reading Room. Tomorrow\u2019s researchers will find much of interest in the GenBank archives as they look back on the last quarter of the twentieth century to learn how scientists first began to conceptualize and envision a computerized collection of DNA sequences, which eventually became the largest scientific database in the world.<\/p>\n<p><em><a href=\"https:\/\/i0.wp.com\/circulatingnow.nlm.nih.gov\/wp-content\/uploads\/2016\/02\/bruno-strasser.png?ssl=1\" rel=\"attachment wp-att-8688\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" data-attachment-id=\"8688\" data-permalink=\"https:\/\/circulatingnow.nlm.nih.gov\/2016\/03\/03\/genbank-the-early-years-of-big-data\/bruno-strasser\/\" data-orig-file=\"https:\/\/i0.wp.com\/circulatingnow.nlm.nih.gov\/wp-content\/uploads\/2016\/02\/bruno-strasser.png?fit=600%2C811&ssl=1\" data-orig-size=\"600,811\" data-comments-opened=\"1\" data-image-meta=\"{"aperture":"0","credit":"","camera":"","caption":"","created_timestamp":"0","copyright":"","focal_length":"0","iso":"0","shutter_speed":"0","title":"","orientation":"0"}\" data-image-title=\"Bruno J. Strasser\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/circulatingnow.nlm.nih.gov\/wp-content\/uploads\/2016\/02\/bruno-strasser.png?fit=222%2C300&ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/circulatingnow.nlm.nih.gov\/wp-content\/uploads\/2016\/02\/bruno-strasser.png?fit=600%2C811&ssl=1\" class=\"alignleft wp-image-8688\" src=\"https:\/\/i0.wp.com\/circulatingnow.nlm.nih.gov\/wp-content\/uploads\/2016\/02\/bruno-strasser.png?resize=121%2C163&ssl=1\" alt=\"Casual portrait of Bruno J. Strasser in an urban setting.\" width=\"121\" height=\"163\" \/><\/a>Interested in learning more about the history of GenBank? See these articles:<\/em><\/p>\n<p>Strasser, Bruno J. \u201c<a href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/18948528\">GenBank: Natural History in the 21<sup>st<\/sup> Century?<\/a>\u201d Science 322 (2008): 537-38, PMID: 18948528<\/p>\n<p>Strasser, Bruno J. \u201c<a href=\"http:\/\/www.ncbi.nlm.nih.gov\/pubmed\/21667776\">The Experimenter\u2019s Museum: GenBank, Natural History, and the Moral Economies of Biomedicine.<\/a>\u201d Isis 102, no. 1 (2011): 60-96, PMID: 21667776<\/p>\n<p> <\/p>\n<p><em>If you or someone you know would like to consult the GenBank archives at the NLM, please contact the History of Medicine Division Reference staff at <a href=\"https:\/\/support.nlm.nih.gov\/ics\/support\/KBList.asp?folderID=150&from=\" target=\"_blank\">NLM Customer Support<\/a> or (301) 402-8878.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In cooperation with our colleagues at the National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), the NLM\u2019s<\/p>\n","protected":false},"author":19605840,"featured_media":8685,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_coblocks_attr":"","_coblocks_dimensions":"","_coblocks_responsive_height":"","_coblocks_accordion_ie_support":"","advanced_seo_description":"","jetpack_seo_html_title":"","jetpack_seo_noindex":false,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"GenBank & The Early Years of \u201cBig Data\u201d - Guest blogger Bruno Strasser talks about NLM's recently acquired archive and GenBank history","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false},"version":2}},"categories":[14520,42333869,12763,103],"tags":[12080,22379,10694,2971278,668],"class_list":["post-8683","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-about-us","category-archives-manuscripts","category-collections","category-news","tag-archives","tag-data","tag-genetics","tag-recent-acquisitions","tag-research"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/circulatingnow.nlm.nih.gov\/wp-content\/uploads\/2016\/02\/3360_001_feature.jpg?fit=929%2C361&ssl=1","jetpack_likes_enabled":true,"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p3xcDk-2g3","jetpack-related-posts":[],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/circulatingnow.nlm.nih.gov\/wp-json\/wp\/v2\/posts\/8683","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/circulatingnow.nlm.nih.gov\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/circulatingnow.nlm.nih.gov\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/circulatingnow.nlm.nih.gov\/wp-json\/wp\/v2\/users\/19605840"}],"replies":[{"embeddable":true,"href":"https:\/\/circulatingnow.nlm.nih.gov\/wp-json\/wp\/v2\/comments?post=8683"}],"version-history":[{"count":15,"href":"https:\/\/circulatingnow.nlm.nih.gov\/wp-json\/wp\/v2\/posts\/8683\/revisions"}],"predecessor-version":[{"id":13143,"href":"https:\/\/circulatingnow.nlm.nih.gov\/wp-json\/wp\/v2\/posts\/8683\/revisions\/13143"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/circulatingnow.nlm.nih.gov\/wp-json\/wp\/v2\/media\/8685"}],"wp:attachment":[{"href":"https:\/\/circulatingnow.nlm.nih.gov\/wp-json\/wp\/v2\/media?parent=8683"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/circulatingnow.nlm.nih.gov\/wp-json\/wp\/v2\/categories?post=8683"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/circulatingnow.nlm.nih.gov\/wp-json\/wp\/v2\/tags?post=8683"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}} |