{"id":11043,"date":"2023-03-30T12:39:19","date_gmt":"2023-03-30T16:39:19","guid":{"rendered":"https:\/\/ncbiinsights.ncbi.nlm.nih.gov\/?p=11043"},"modified":"2023-03-30T14:32:15","modified_gmt":"2023-03-30T18:32:15","slug":"dbsnp-scalability-diversity-accessibility","status":"publish","type":"post","link":"https:\/\/ncbiinsights.ncbi.nlm.nih.gov\/2023\/03\/30\/dbsnp-scalability-diversity-accessibility\/","title":{"rendered":"dbSNP Enhances Scalability, Data Diversity, and Accessibility"},"content":{"rendered":"
As part of the <\/span>Human Genome Project<\/span><\/a>, NCBI, part of the National Library of Medicine, and the National Human Genome Research Institute (NHGRI) <\/span>established the Single Nucleotide Polymorphism database<\/span><\/a> (dbSNP) in 1998. Over the last 25 years, dbSNP has evolved into a reliable central public repository for genetic variation data. dbSNP is a community-accepted reference data set for genetic research, analysis pipelines, and for both open-source and commercial tools. It is also an essential part of genetic research and discovery. For example, dbSNP data are used in nearly all human genetic variation research workflows and it serves as the foundation for commercially available ancestry testing products. <\/span>\u00a0<\/span><\/p>\n <\/p>\n We have made numerous improvements to make molecular variation more accessible for physical mapping, population genetics, investigations into evolutionary relationships, genome wide association, and quickly quantifying the amount of variation at a given site of interest.\u00a0<\/span>\u00a0<\/span><\/p>\n We have fundamentally improved our infrastructure <\/span>and <\/span>the underlying technology <\/span>and <\/span>data release processes. This makes dbSNP more reliable <\/span>and <\/span>efficient to cope with the large amounts of data <\/span>and <\/span>exponential growth over the last few years. <\/span>\u00a0<\/span><\/p>\n We created Allele Frequency Aggregator (<\/span>ALFA<\/span><\/a>) to provide more granular allele frequencies for populations derived from 198K subjects, with the goal of 1M subjects, from dbGaP controlled-access studies. ALFA will improve the discovery of common and uncommon variations that have biological effects or contribute to disease. These data included chip array, exome, and genomic sequencing data from 12 distinct populations, including European, African, Asian, and Latin American subjects. We put these data into regular dbSNP build releases and ALFA data into RefSeq. ALFA can be accessed via the <\/span>browser<\/span><\/a>, <\/span>FTP<\/span><\/a>, <\/span>API<\/span><\/a>, and <\/span>TrackHub<\/span><\/a>. \u00a0On <\/span>GitHub<\/span><\/a>, we have tutorials and code examples to help with programming.<\/span>\u00a0<\/span><\/p>\nCurrent dbSNP statistics include:<\/h5>\n
\n
What\u2019s new?<\/h5>\n