895 lines
58 KiB
Text
895 lines
58 KiB
Text
|
||
<!DOCTYPE html>
|
||
<html>
|
||
<head>
|
||
<title>Frequently Asked Questions - Genome Reference Consortium</title>
|
||
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
|
||
<meta name="ncbi_app" content="grc" />
|
||
<meta name="ncbi_db" content="none" />
|
||
<meta name="ncbi_pdid" content="screen_default" />
|
||
<meta name="ncbi_pagename" content="faq" />
|
||
<meta name="ncbi_pagetitle" content="Frequently Asked Questions" />
|
||
|
||
<link rel="stylesheet" type="text/css" href="/grc/static/grc/css/grc.css" />
|
||
<style>
|
||
#menu ul#primary a#helpMenuItem {
|
||
border: 1px solid #666;
|
||
border-bottom: none;
|
||
background: #333;
|
||
padding-bottom: 6px;
|
||
margin-top: 0;
|
||
color: #ff9933;
|
||
text-decoration: none;
|
||
}
|
||
|
||
|
||
#menu ul#secondary li a#help-faqMenuItem {
|
||
color: #EFEFEF;
|
||
}
|
||
</style>
|
||
|
||
<link rel="stylesheet" type="text/css" href="/core/jig/1.14.8/css/jig.min.css" />
|
||
|
||
<script type="text/javascript" src="/core/jig/1.14.8/js/jig.min.js"></script>
|
||
|
||
<script type="text/javascript" src="/grc/static/grc/js/grc.js"></script>
|
||
|
||
|
||
</head>
|
||
<body>
|
||
<noscript>
|
||
<p>
|
||
<b>Warning:</b> this web site requires JavaScript to function. <a href="/guide/browsers#js_settings">more...</a>
|
||
</p>
|
||
</noscript>
|
||
<div id="header-container">
|
||
<a href="/grc">
|
||
<div id="header">
|
||
<img alt="GRC logo" src="/grc/static/grc/img/GRC_logo_reasonably_small.png" /><img src="/grc/static/grc/img/TitleBanner.png" alt="Genome Reference Consortium" /></a>
|
||
</div>
|
||
</div>
|
||
<div id="page">
|
||
<a id="skipnav" href="#main">Skip navigation and go to main content</a>
|
||
<div id="menu">
|
||
<ul id="primary">
|
||
<li><a href="/grc" title="GRC home" id="homeMenuItem">GRC Home</a></li>
|
||
<li><a href="/grc/data" title="Data" id="dataMenuItem">Data</a>
|
||
|
||
<ul id="secondary">
|
||
<li><a href="/grc/help" title="Help overview" id="help-helpMenuItem">Overview</a></li>
|
||
<li><a href="/grc/help/definitions" title="GRC definitions" id="help-definitionsMenuItem">Definitions</a></li>
|
||
<li><a href="/grc/help/faq" title="Frequently Asked Questions" id="help-faqMenuItem">FAQ</a></li>
|
||
<li><a href="/grc/help/patches" title="Patches tutorial" id="help-patchesMenuItem">Patches</a></li>
|
||
<li><a href="/grc/help/human-examples" title="Examples of human regions" id="help-human-examplesMenuItem">Human Region Examples</a></li>
|
||
<li><a href="/grc/help/mouse-examples" title="Examples of mouse regions" id="help-mouse-examplesMenuItem">Mouse Region Examples</a></li>
|
||
<li><a href="/grc/help/workshops" title="Workshops and presentation slides" id="help-workshopsMenuItem">Workshops</a></li>
|
||
</ul>
|
||
|
||
|
||
</li>
|
||
<li><a href="/grc/help" title="Information and help" id="helpMenuItem">Help</a></li>
|
||
<li><a href="/grc/report-an-issue" title="Report a problem" id="reportAnIssueMenuItem">Report an Issue</a></li>
|
||
<li><a href="/grc/contact-us" title="contact us" id="contactUsMenuItem">Contact Us</a></li>
|
||
<li><a href="/grc/credits" title="credits" id="creditsMenuItem">Credits</a></li>
|
||
|
||
<li><a href="/projects/genome/assembly/grc/curation" title="Curators only, authentication required" id="curatorsOnlyMenuItem">Curators Only</a></li>
|
||
|
||
</ul>
|
||
</div><!--end menu-->
|
||
<div id="main"><a name="main"></a>
|
||
<div id="faq">
|
||
<div id="contents" style="overflow:hidden;">
|
||
|
||
<div id="grc-cms-content">
|
||
|
||
<h1 id="frequently-asked-questions">Frequently Asked Questions</h1>
|
||
|
||
<p/>
|
||
|
||
<h4 id="genome-reference-assembly">Genome Reference Assembly</h4>
|
||
|
||
<ul>
|
||
<li><a href="#name-of-human-reference-genome">What is the correct name for the human genome reference assembly?</a></li>
|
||
<li><a href="#name-of-mouse-reference-genome">What is the correct name for the mouse genome reference assembly?</a></li>
|
||
<li><a href="#name-of-zebrafish-reference-genome">What is the correct name for the zebrafish genome reference assembly?</a></li>
|
||
<li><a href="#name-of-chicken-reference-genome">What is the correct name for the chicken genome reference assembly?</a></li>
|
||
<li><a href="#human-reference-genome-individuals">How many individuals were sequenced for the human reference genome assembly?</a></li>
|
||
<li><a href="#dna-source-of-human-reference-genome">Where can I get information about the DNA sources for the human reference genome?</a></li>
|
||
<li><a href="#dna-source-of-mouse-reference-genome">Where can I get information about the DNA sources for the mouse reference genome?</a></li>
|
||
<li><a href="#dna-source-of-zebrafish-reference-genome">Where can I get information about the DNA source for the zebrafish reference genome?</a></li>
|
||
<li><a href="#dna-source-of-chicken-reference-genome">Where can I get information about the DNA source for the chicken reference genome?</a></li>
|
||
<li><a href="#how-to-obtain-clones">How can I obtain genomic clones used in the reference assemblies?</a></li>
|
||
</ul>
|
||
|
||
<h4 id="obtaining-data-and-assembly-upda">Obtaining Data and Assembly Updates</h4>
|
||
|
||
<ul>
|
||
<li><a href="#which-genome-is-reference">There are a lot of human genome assemblies in GenBank, which one is the reference?</a></li>
|
||
<li><a href="#working-version-of-reference-assembly">Where can I see the current working version of the reference assembly?</a></li>
|
||
<li><a href="#download-genome-sequences">Where can I download genome sequences?</a></li>
|
||
<li><a href="#source-of-sequences-for-human-reference">Where can I find the component sequences for the human reference assembly?</a></li>
|
||
<li><a href="#reference-statistics">Where can I find assembly statistics?</a></li>
|
||
<li><a href="#assembly-update-schedule">When are you going to update the human/mouse/zebrafish/chicken reference genome assembly again?</a></li>
|
||
<li><a href="#difference-between-major-and-minor-release">What is the difference between a GRC major assembly release and a patch (minor assembly) release?</a></li>
|
||
<li><a href="#difference-between-alternate-loci-and-novel-patch">What are alternate loci and novel patches?</a></li>
|
||
<li><a href="#fix-patches">What are fix patches?</a></li>
|
||
<li><a href="#assembly-regions">What are assembly regions?</a></li>
|
||
<li><a href="#mhc-haplotypes-in-reference">What MHC haplotypes are used in the reference assembly</a></li>
|
||
<li><a href="#lrc-kir-haplotypes-in-reference">What LRC-KIR haplotypes are used in the reference assembly?</a></li>
|
||
<li><a href="#format-reference-data-for-read-alignment">Can I get reference assembly data sets formatted for use by sequence read alignment pipelines?</a></li>
|
||
<li><a href="#human-reference-genome-and-common-alleles">Does the human reference genome represent common/major allele at all chromosomal loci?</a></li>
|
||
<li><a href="#reference-assembly-method">What assembly method was used to create the reference assembly?</a></li>
|
||
<li><a href="#grc-assembly-resources">What types of assembly resources are available from the GRC?</a></li>
|
||
<li><a href="#issue-type">What are the different types of GRC curation issues?</a></li>
|
||
<li><a href="#issue-status">What is meant by the different statuses for GRC curation issues?</a></li>
|
||
<li><a href="#genes-annotations-in-reference-genome">Where can I find gene content and annotations of the genome reference assembly?</a></li>
|
||
<li><a href="#ribosomal-dna-in-reference">Does the human reference assembly contain representation for ribosomal DNA sequences?</a></li>
|
||
<li><a href="#grc-issues">How do I find the latest data about reference assembly problems?</a></li>
|
||
<li><a href="#reporting-assembly-problems">How can I report an assembly problem?</a></li>
|
||
<li><a href="#how-to-cite">How to cite the GRC or a reference assembly?</a></li>
|
||
</ul>
|
||
|
||
<h4 id="bioinformatics-tools">Bioinformatics Tools</h4>
|
||
|
||
<ul>
|
||
<li><a href="#mapping-between-releases-assembly-alignments-or-converting-reference-and-other-assemblies">What tools can I use to map/convert data between different releases of the reference assembly or between the reference and other genome assemblies?</a></li>
|
||
</ul>
|
||
|
||
<hr/>
|
||
|
||
<p/>
|
||
|
||
<h4 id="genome-reference-assembly_1">Genome Reference Assembly</h4>
|
||
|
||
<p><span id="name-of-human-reference-genome"/>
|
||
<strong>What is the correct name for the human genome reference assembly?</strong></p>
|
||
|
||
<p>The official name for the current human reference genome assembly is <a href="https://www.ncbi.nlm.nih.gov/grc/human">Genome Reference Consortium Human Build 38</a>. It is abbreviated as GRCh38. GRCh38 is referred to as hg38 in the UCSC Genome Browser, but this is not the official assembly name or abbreviation. The GenBank accession for GRCh38 is GCA_000001405.15. RefSeq annotates an identical copy of GRCh38, which has the accession GCF_000001405.26. An accession provides a unique and unambiguous assembly identifier and the GRC recommends its use in all publications and assembly communications. Assembly patch releases, which provide corrections and add new alternate sequence representations to the reference without changing chromosome coordinates, are named by the addition of the suffix “.p<number>” to the assembly name and a version increment to the GenBank accession. For example, the ninth patch release of GRCh38 is officially known as Genome Reference Consortium Human Build 38 patch release 9, abbreviated as GRCh38.p9 and has GenBank accession GCA_000001405.24 and RefSeq accession GCF_000001405.35. Please see the GRC website’s <a href="https://www.ncbi.nlm.nih.gov/grc/human">Human Overview</a> for the current patch release. </number></p>
|
||
|
||
<p><span id="name-of-mouse-reference-genome"/>
|
||
<strong>What is the correct name for the mouse genome reference assembly?</strong></p>
|
||
|
||
<p>The official name for the current mouse reference genome assembly is <a href="https://www.ncbi.nlm.nih.gov/grc/mouse">Genome Reference Consortium Mouse Build 38</a>. It is abbreviated as GRCm38. GRCm38 is referred to as mm10 in the UCSC Genome Browser, but this is not the official assembly name or abbreviation. The GenBank accession for GRCm38 is GCA_000001635.2. RefSeq annotates an identical copy of GRCm38, which has the accession GCF_000001635.20. An accession provides a unique and unambiguous assembly identifier and the GRC recommends its use in all publications and assembly communications. Assembly patch releases, which provide corrections and add new alternate sequence representations to the reference without changing chromosome coordinates, are named by the addition of the suffix “.p<number>” to the assembly name and a version increment to the GenBank accession. For example, the fifth patch release of GRCm38 is officially known as Genome Reference Consortium Mouse Build 38 patch release 5, abbreviated as <a href="https://www.ncbi.nlm.nih.gov/assembly/GCF_000001635.25/">GRCm38.p5</a> and has GenBank accession GCA_000001635.7 and RefSeq accession GCF_000001635.25. Please see the GRC website’s <a href="https://www.ncbi.nlm.nih.gov/grc/mouse">Mouse Overview</a> for the current patch release.</number></p>
|
||
|
||
<p><span id="name-of-zebrafish-reference-genome"/>
|
||
<strong>What is the correct name for the zebrafish genome reference assembly?</strong></p>
|
||
|
||
<p>The official name for the current zebrafish reference genome assembly is <a href="https://www.ncbi.nlm.nih.gov/grc/zebrafish">Genome Reference Consortium Zebrafish Build 11</a>. It is abbreviated as GRCz11. GRCz11 is referred to as danRer11 in the UCSC Genome Browser, but this is not the official assembly name or abbreviation. The GenBank accession for GRCz11 is GCA_000002035.4. RefSeq annotates an identical copy of GRCz11 which has the accession GCF_000002035.6. An accession provides a unique and unambiguous assembly identifier and the GRC recommends its use in all publications and assembly communications.</p>
|
||
|
||
<p><span id="name-of-chicken-reference-genome"/>
|
||
<strong>What is the correct name for the chicken genome reference assembly?</strong></p>
|
||
|
||
<p>The official name for the current chicken reference genome assembly is <a href="https://www.ncbi.nlm.nih.gov/grc/chicken">Genome Reference Consortium Chicken Build 6a</a>. It is abbreviated as GRCg6a. The GenBank accession for the assembly is GCA_000002315.5. RefSeq annotates an identical copy of GRCg6a which has the accession GCF_000002315.5. An accession provides a unique and unambiguous assembly identifier and the GRC recommends its use in all publications and assembly communications.</p>
|
||
|
||
<p><span id="human-reference-genome-individuals"/>
|
||
<strong>How many individuals were sequenced for the human reference genome assembly?</strong></p>
|
||
|
||
<p>The human reference genome is a composite genome, derived from the sequence of several different anonymous individuals. Approximately 93% of the GRCh38 primary assembly (the assembled chromosomes, unlocalized and unplaced sequences) consists of sequences from 11 genomic clone libraries (a library can generally be considered a proxy for an individual’s genome). One of these libraries, RP11 or RPCI - 11 Human Male BAC Library has a much higher representation than all others, and contributes to 70% of the primary assembly. The donor of RP11 library was an anonymous male, though analysis suggests his DNA is an African-European admixture (see page 146 of Supporting Online Material of <a href="https://www.ncbi.nlm.nih.gov/pubmed/?term=20448178">PMID:20448178</a>). The remaining 7% represents sequences from >50 libraries. These libraries were developed from individuals (male and female), as well as flow-sorted chromosomes from various cell lines. The make-up of GRCh37 is largely similar to GRCh38 (Figure 1).</p>
|
||
|
||
<p><img alt="GRCh37_38_LibrariesBreakdown" height="320" src="/core/assets/grc-web-docs/img/GRCh37_38_LibrariesBreakdown.png" width="397"/></p>
|
||
|
||
<p><strong>Figure 1.</strong> Contribution of genomic libraries to GRCh37 and GRCh38. </p>
|
||
|
||
<p><span id="dna-source-of-human-reference-genome"/>
|
||
<strong>Where can I get information about the DNA sources for the human reference genome?</strong></p>
|
||
|
||
<p>Publications describing the generation of the human reference assembly provide information about the DNA sources used:</p>
|
||
|
||
<ul>
|
||
<li>
|
||
<p><a href="https://www.ncbi.nlm.nih.gov/pubmed/11237011">Initial sequencing and analysis of the human genome</a></p>
|
||
</li>
|
||
<li>
|
||
<p><a href="https://www.ncbi.nlm.nih.gov/pubmed/15496913">Finishing the euchromatic sequence of the human genome</a></p>
|
||
</li>
|
||
<li>
|
||
<p><a href="https://www.ncbi.nlm.nih.gov/pubmed/?term=28396521">Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly</a></p>
|
||
</li>
|
||
</ul>
|
||
|
||
<p>Also, see FAQ <a href="#human-reference-genome-individuals">How many individuals were sequenced for the human reference genome assembly?</a></p>
|
||
|
||
<p><span id="dna-source-of-mouse-reference-genome"/>
|
||
<strong>Where can I get information about the DNA sources for the mouse reference genome?</strong></p>
|
||
|
||
<p>Publications describing the mouse reference assembly include information about DNA sources:</p>
|
||
|
||
<ul>
|
||
<li>
|
||
<p><a href="https://www.ncbi.nlm.nih.gov/pubmed/12466850">Initial sequencing and comparative analysis of the mouse genome</a></p>
|
||
</li>
|
||
<li>
|
||
<p><a href="https://www.ncbi.nlm.nih.gov/pubmed/19468303">Lineage-specific biology revealed by a finished genome assembly of the mouse</a></p>
|
||
</li>
|
||
</ul>
|
||
|
||
<p>In the mouse reference assembly, all sequences in the primary assembly unit (chromosomes, unlocalized and unplaced scaffolds) represent the C57BL/6J strain. These sequences overwhelming come from genomic libraries derived from 3 different mice. The RP23 and RP24 BAC libraries come from a female and male mouse, respectively, both of which are from generation F204-207. The fosmid WI1 library was generated from a female mouse in generation F208-F214. For several genomic regions that exhibit strain variation, the GRC provides alternate loci assembly units containing scaffolds comprised of clone-based sequences from other strains.</p>
|
||
|
||
<p><span id="dna-source-of-zebrafish-reference-genome"/>
|
||
<strong>Where can I get information about the DNA source for the zebrafish reference genome?</strong> </p>
|
||
|
||
<p>The publication describing the zebrafish reference assembly includes information about DNA source:</p>
|
||
|
||
<ul>
|
||
<li><a href="https://www.ncbi.nlm.nih.gov/pubmed/23594743">The zebrafish reference genome sequence and its relationship to the human genome</a></li>
|
||
</ul>
|
||
|
||
<p/>
|
||
|
||
<p>The zebrafish genome reference assembly represents the sequence of the Tuebingen strain. Several genomic clone libraries and WGS were used, the main description of the zebrafish genome assembly construction can be found in the Supplementary information to the paper.</p>
|
||
|
||
<p><span id="dna-source-of-chicken-reference-genome"/>
|
||
<strong>Where can I get information about the DNA source for the chicken reference genome?</strong></p>
|
||
|
||
<p>The publication describing the chicken reference assembly:
|
||
<a href="https://www.ncbi.nlm.nih.gov/pubmed/27852011">A New Chicken Genome Assembly Provides Insight into Avian Genome Structure</a></p>
|
||
|
||
<p>The chicken genome reference assembly GRCg6a was generated from a single individual from the <a href="https://www.ncbi.nlm.nih.gov/biosample/SAMN02981218/">Red Jungle Fowl strain, inbred line UCD001</a>. The assembly is a hybrid comprised primarily of WGS contigs, into which genomic clones from libraries of CH261, J_AD, J_AE, TAM31, TAM32 and TAM33 have been integrated.</p>
|
||
|
||
<hr/>
|
||
|
||
<p/>
|
||
|
||
<h4 id="obtaining-data-and-assembly-upda_1">Obtaining Data and Assembly Updates</h4>
|
||
|
||
<p><span id="which-genome-is-reference"/></p>
|
||
|
||
<p><strong>There are a lot of human genome assemblies in <a href="https://www.ncbi.nlm.nih.gov/assembly/organism/9606/all/">GenBank</a>, which one is the reference?</strong></p>
|
||
|
||
<p><a href="https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.26">GRCh38</a> is the current major release of the human reference assembly. It is curated by the GRC, who also release minor assembly updates to reflect corrections and addition of new alternate sequence representations, without changing chromosome coordinates. You can recognize minor releases by the suffix “.p<number>”. For example, <a href="https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.35/">GRCh38.p9</a> is the ninth minor/patch release of the human reference assembly. An assembly “Genome representation” indicates completeness of the assembly as “full/partial". The “version status” which is shown in parentheses after the assembly accession represents whether the assembly is the “latest/replaced” version. See the FAQ <a href="#name-of-human-reference-genome">What is the correct name for the human genome reference assembly?</a> for additional information.</number></p>
|
||
|
||
<p><span id="working-version-of-reference-assembly"/></p>
|
||
|
||
<p><strong>Where can I see the current working version of the reference assembly?</strong></p>
|
||
|
||
<p>Working assembly versions include pre-release assembly edits and are subject to change at any time. You can view the tiling path files (TPFs) for the current working versions of all GRC-curated assemblies on the <a href="https://www.ncbi.nlm.nih.gov/grc/tpf">TPF Overview</a> page of the GRC website. The TPFs and corresponding <a href="https://www.ncbi.nlm.nih.gov/assembly/agp/">AGP</a> (A Golden Path) files are also available on the <a href="https://ftp.ncbi.nlm.nih.gov/pub/grc/">GRC FTP</a> site in the “MOST_RECENT” sub-directory of the GRC sub-directory for each organism <a href="https://ftp.ncbi.nlm.nih.gov/pub/grc/human/GRC/MOST_RECENT/">(e.g. ftp://ftp.ncbi.nlm.nih.gov/pub/grc/human/GRC/MOST_RECENT/)</a>.</p>
|
||
|
||
<p><span id="download-genome-sequences"/></p>
|
||
|
||
<p><strong>Where can I download genome sequences?</strong></p>
|
||
|
||
<p>GRC assemblies should be downloaded from the GenBank Assembly FTP site. The GRC organism overview webpages provide convenient access to these data. From the <a href="https://www.ncbi.nlm.nih.gov/grc">GRC Home page</a> select an organism (<a href="https://www.ncbi.nlm.nih.gov/grc/human">Human</a>, <a href="https://www.ncbi.nlm.nih.gov/grc/mouse">Mouse</a>, <a href="https://www.ncbi.nlm.nih.gov/grc/zebrafish">Zebrafish</a>, <a href="https://www.ncbi.nlm.nih.gov/grc/chicken">Chicken</a>). The FTP links to the genome data and sequences can be found in the page section “Download data”. </p>
|
||
|
||
<p><span id="source-of-sequences-for-human-reference"/></p>
|
||
|
||
<p><strong>Where can I find the component sequences for the human reference assembly?</strong></p>
|
||
|
||
<p>The construction of all GRC-curated assemblies is described in <a href="https://www.ncbi.nlm.nih.gov/assembly/agp/">AGP</a> (A Golden Path) files, which can be downloaded from the corresponding GenBank FTP directory. The organism overview pages on the GRC website (e.g. <a href="https://www.ncbi.nlm.nih.gov/grc/human">https://www.ncbi.nlm.nih.gov/grc/human</a>) provide easy access to these data via the “Download data” section. The AGP files are found in the assembly unit subdirectories. Please see the current <a href="https://www.ncbi.nlm.nih.gov/assembly/agp/AGP_Specification/">AGP specifications</a> for a detailed description of this file format.</p>
|
||
|
||
<p><span id="reference-statistics"/></p>
|
||
|
||
<p><strong>Where can I find assembly statistics?</strong></p>
|
||
|
||
<p>Assembly statistics for GRC-curated organisms are shown on the GRC website and can be downloaded from GenBank:</p>
|
||
|
||
<p><strong>Human</strong>: <a href="https://www.ncbi.nlm.nih.gov/grc/human/data">Human Assembly Data</a> for the current release. To download the statistic report for the latest release go to: <a href="https://www.ncbi.nlm.nih.gov/assembly/?term=GCF_000001405">https://www.ncbi.nlm.nih.gov/assembly/?term=GCF_000001405</a>. In this page, clicking on the assembly on the list will take you to the assembly page for the latest release. Select <a href="https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.38_GRCh38.p12/GCF_000001405.38_GRCh38.p12_assembly_stats.txt">Download the statistics report</a> on the right side of the page and under “Access the data”.</p>
|
||
|
||
<p><strong>Mouse</strong>: <a href="https://www.ncbi.nlm.nih.gov/grc/mouse/data">Mouse Assembly Data</a> for the current release. To download the statistic report for the latest release go to: <a href="https://www.ncbi.nlm.nih.gov/assembly/?term=GCF_000001635">https://www.ncbi.nlm.nih.gov/assembly/?term=GCF_000001635</a>. In this page, clicking on the assembly on the list will take you to the assembly page for the latest release. Select <a href="https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/635/GCF_000001635.26_GRCm38.p6/GCF_000001635.26_GRCm38.p6_assembly_stats.txt">Download the statistics report</a> on the right side of the page and under “Access the data”.</p>
|
||
|
||
<p><strong>Zebrafish</strong>: <a href="https://www.ncbi.nlm.nih.gov/grc/zebrafish/data">Zebrafish Assembly Data</a> for the current release. To download the statistic report for the latest release go to: <a href="https://www.ncbi.nlm.nih.gov/assembly/?term=GCF_000002035">https://www.ncbi.nlm.nih.gov/assembly/?term=GCF_000002035</a>. In this page, clicking on the assembly on the list will take you to the assembly page for the latest release. Select <a href="https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/035/GCF_000002035.6_GRCz11/GCF_000002035.6_GRCz11_assembly_stats.txt">Download the statistics report</a> on the right side of the page and under “Access the data”.</p>
|
||
|
||
<p><strong>Chicken</strong>: <a href="https://www.ncbi.nlm.nih.gov/grc/chicken/data">Chicken Assembly Data</a> for the current release. To download the statistic report for the latest release go to: <a href="https://www.ncbi.nlm.nih.gov/assembly/?term=GCF_000002315">https://www.ncbi.nlm.nih.gov/assembly/?term=GCF_000002315</a>. In this page, clicking on the assembly on the list will take you to the assembly page for the latest release. Select <a href="https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/315/GCF_000002315.5_GRCg6a/GCF_000002315.5_GRCg6a_assembly_stats.txt">Download the statistics report</a> on the right side of the page and under “Access the data”.</p>
|
||
|
||
<p><span id="assembly-update-schedule"/></p>
|
||
|
||
<p><strong>When are you going to update the human/mouse/zebrafish/chicken reference genome assembly again?</strong></p>
|
||
|
||
<p>All assembly release plans, including those for non-coordinate changing patch updates, are provided on the organism-specific pages of the GRC website <a href="https://www.ncbi.nlm.nih.gov/grc/human">Human</a>, <a href="https://www.ncbi.nlm.nih.gov/grc/mouse">Mouse</a>, <a href="https://www.ncbi.nlm.nih.gov/grc/zebrafish">Zebrafish</a>, <a href="https://www.ncbi.nlm.nih.gov/grc/chicken">Chicken</a>. When we start to plan for a major release for any organism, we make community-specific announcements via multiple routes of communication and we continue to provide updates as timelines become more defined. To receive information for the latest assembly updates, subscribe to the <a href="https://www.ncbi.nlm.nih.gov/mailman/listinfo/grc-announce">GRC-announce</a> email list.</p>
|
||
|
||
<p><span id="difference-between-major-and-minor-release"/></p>
|
||
|
||
<p><strong>What is the difference between a GRC major assembly release and a patch (minor assembly) release?</strong></p>
|
||
|
||
<p>A GRC major assembly release such as <a href="https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.26">GRCh38</a> (human) or <a href="https://www.ncbi.nlm.nih.gov/assembly/GCA_000001635.2">GRCm38</a> (mouse) is comprised of a primary assembly unit (consisting of the chromosomes, unlocalized and unplaced scaffolds) and one or more alternate loci assembly units that contain scaffolds providing alternate sequence representations for discrete regions of the primary assembly unit, and the alignments of those scaffolds to the chromosomes. A patch release, such as <a href="https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.38/">GRCh38.p12</a> (human) or <a href="https://www.ncbi.nlm.nih.gov/assembly/GCF_000001635.26">GRCm38.p6</a> (mouse), is a minor assembly release that does not change any sequence coordinates in the major assembly release. In addition to having the same assembly units of the corresponding major release, patch releases include a “PATCHES” assembly unit. This additional unit includes scaffolds that provide updated sequence for particular genomic regions in the form of fix patches (assembly corrections) and novel patches (alternate sequence representations). Patch releases are cumulative, such that the PATCHES assembly unit for the latest minor release contains the scaffolds and associated data for all prior patch releases. For additional information, please see <a href="https://www.ncbi.nlm.nih.gov/grc/help/patches">Introductions to Patches</a>.</p>
|
||
|
||
<p><span id="difference-between-alternate-loci-and-novel-patch"/></p>
|
||
|
||
<p><strong>What are alternate loci and novel patches?</strong></p>
|
||
|
||
<p>Alternate loci and novel patches enable the reference assembly to represent allelic diversity. They are scaffold sequences that are given chromosome context through alignments to the corresponding chromosome regions. Alternate loci scaffolds and their alignments are included in major assembly releases, while novel patch scaffolds and their alignments are included in subsequent patch releases for that assembly. They can be considered functionally equivalent, as novel patches will be reassigned to the role of alternate loci scaffolds at the time of the next major assembly release. Assembly regions for which the GRC provides alternate loci or novel patch scaffolds are typically those with known alternate haplotypes (e.g. immune-associated regions), highly variable genomic regions (e.g. olfactory receptor regions) or those where there are structural variants having 5 Kb or more sequence not represented on the chromosome.
|
||
Human alternate loci and all novel patch scaffolds also include one or more anchor sequence components to ensure their robust alignment to the chromosomes. Anchor sequences are component(s) that are also found in the corresponding chromosome. The sequence locations corresponding to anchor components are annotated on the GenBank records for all alternate loci and patch scaffolds. For more detail on patches please see <a href="https://www.ncbi.nlm.nih.gov/grc/help/patches">Introductions to Patches</a>.</p>
|
||
|
||
<p><span id="fix-patches"/></p>
|
||
|
||
<p><strong>What are fix patches?</strong></p>
|
||
|
||
<p>Fix patches represent changes to existing assembly sequences. These are generally error corrections (such as base changes, component replacements/updates, switch point updates or tiling path changes) or assembly improvements (such as extension of sequence into gaps). Fix patches are scaffold sequences that are given chromosome context through alignments to the corresponding chromosome regions. A fix patch scaffold represents a preview of what the assembly will look like at the next major release. When the next major release occurs, the accessions for the fix patch scaffolds will be deprecated and the changes will be found in the chromosomes.
|
||
Human fix patch scaffolds also include one or more anchor sequence components to ensure their robust alignment to the chromosomes. Anchor sequences are component(s) that are also found in the corresponding chromosome. The sequence locations corresponding to anchor components are annotated on the GenBank records for all fix patch scaffolds. For more detail on patches please see <a href="https://www.ncbi.nlm.nih.gov/grc/help/patches">Introductions to Patches</a>.</p>
|
||
|
||
<p><span id="assembly-regions"/></p>
|
||
|
||
<p><strong>What are assembly regions?</strong></p>
|
||
|
||
<p>Human alternate loci and all patches are assigned to named assembly regions. The regions to which alternate loci and novel patches are assigned are defined as sequence ranges on primary assembly unit sequences (chromosome and unlocalized or unplaced scaffolds), while regions for fix patches may also be found on alternate loci scaffolds. A region contains one or more alternate loci or patch scaffolds. While scaffolds from different assembly units may overlap within a region, there is no overlap of scaffolds from the same assembly unit within a region.
|
||
The GRC provides web pages with detailed information for each region that can be accessed from the “Patches and alternate loci” tables found on the organism overview pages of the GRC website (e.g. <a href="https://www.ncbi.nlm.nih.gov/grc/human">https://www.ncbi.nlm.nih.gov/grc/human</a>). Assembly region reports can also be downloaded from the GenBank FTP site for the assembly of interest <a href="https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.38_GRCh38.p12/GCF_000001405.38_GRCh38.p12_assembly_regions.txt">GRCh38.p12_assembly_regions.txt</a> and <a href="https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/635/GCF_000001635.26_GRCm38.p6/GCF_000001635.26_GRCm38.p6_assembly_regions.txt">GRCm38.p6_assembly_regions.txt</a>. </p>
|
||
|
||
<p><span id="mhc-haplotypes-in-reference"/></p>
|
||
|
||
<p><strong>What MHC haplotypes are used in the reference assembly</strong></p>
|
||
|
||
<p>We use sequences defined by the <a href="https://www.ucl.ac.uk/cancer/research/department-cancer-biology/medical-genomics/medical-genomics-past-projects/mhc-haplotype">MHC consortium</a>.</p>
|
||
|
||
<table>
|
||
<thead>
|
||
<tr>
|
||
<th>GenBank ID</th>
|
||
<th>RefSeq ID</th>
|
||
<th>Assembly unit</th>
|
||
<th>Clone library</th>
|
||
<th>Cell line</th>
|
||
<th>Haplotype</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
<tr>
|
||
<td><a href="/nuccore/CM000668.2">CM000668.2</a></td>
|
||
<td><a href="/nuccore/NC_000006.12">NC_000006.12</a></td>
|
||
<td>Primary</td>
|
||
<td>CH501</td>
|
||
<td>PGF</td>
|
||
<td>A3-B7-DR15</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nuccore/GL000250.2">GL000250.2</a></td>
|
||
<td><a href="/nuccore/NT_167244.2">NT_167244.2</a></td>
|
||
<td>ALT_REF_LOCI_1</td>
|
||
<td>DAAP</td>
|
||
<td>APD</td>
|
||
<td>A1-B60-DR13</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nuccore/GL000251.2">GL000251.2</a></td>
|
||
<td><a href="/nuccore/NT_113891.3">NT_113891.3</a></td>
|
||
<td>ALT_REF_LOCI_2</td>
|
||
<td>CH502</td>
|
||
<td>COX</td>
|
||
<td>A1-B8-DR3</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nuccore/GL000252.2">GL000252.2</a></td>
|
||
<td><a href="/nuccore/NT_167245.2">NT_167245.2</a></td>
|
||
<td>ALT_REF_LOCI_3</td>
|
||
<td>DADB</td>
|
||
<td>DBB</td>
|
||
<td>A2-B57-DR7</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nuccore/GL000253.2">GL000253.2</a></td>
|
||
<td><a href="/nuccore/NT_167246.2">NT_167246.2</a></td>
|
||
<td>ALT_REF_LOCI_4</td>
|
||
<td>DAMA</td>
|
||
<td>MANN</td>
|
||
<td>A29-B44-DR7</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nuccore/GL000254.2">GL000254.2</a></td>
|
||
<td><a href="/nucore/NT_167247.2">NT_167247.2</a></td>
|
||
<td>ALT_REF_LOCI_5</td>
|
||
<td>DAMC</td>
|
||
<td>MCF</td>
|
||
<td>A2-B62-DR4</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nuccore/GL000255.2">GL000255.2</a></td>
|
||
<td><a href="/nuccore/NT_167248.2">NT_167248.2</a></td>
|
||
<td>ALT_REF_LOCI_6</td>
|
||
<td>DAQB</td>
|
||
<td>QBL</td>
|
||
<td>A26-B18-DR3</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nuccore/GL000256.2">GL000256.2</a></td>
|
||
<td><a href="/nuccore/NT_167249.2">NT_167249.2</a></td>
|
||
<td>ALT_REF_LOCI_7</td>
|
||
<td>DASS</td>
|
||
<td>SSTO</td>
|
||
<td>A32-B44-DR4</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
|
||
<p><span id="lrc-kir-haplotypes-in-reference"/></p>
|
||
|
||
<p><strong>What LHC-KIR haplotypes are used in the reference assembly?</strong></p>
|
||
|
||
<p>LRC haplotypes provided as alternate loci or novel patches in the GRCh37 and GRCh38 assemblies are described in the following publications: <a href="/pubmed/19959527">Traherne et al., 2010</a>, <a href="/pubmed/17092261">Horton et al., 2006</a>, and <a href="/pubmed/18759923">Barrow and Trowsdale, 2008</a>.</p>
|
||
|
||
<p>More information can be found at the <a href="http://vega.sanger.ac.uk/info/data/LRC_Homo_sapiens.html">LRC Haplotype Project</a> and <a href="https://www.ebi.ac.uk/ipd/kir/">IPD-KIR</a>.</p>
|
||
|
||
<table>
|
||
<thead>
|
||
<tr>
|
||
<th>GenBank ID</th>
|
||
<th>RefSeq ID</th>
|
||
<th>Assembly unit</th>
|
||
<th>Clone library</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
<tr>
|
||
<td><a href="/nucleotide/CM000681.2">CM000681.2</a></td>
|
||
<td><a href="/nuccore/NC_000019.10">NC_000019.10</a></td>
|
||
<td>Primary</td>
|
||
<td>CHM1</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/GL949746.1">GL949746.1</a></td>
|
||
<td><a href="/nuccore/NW_003571054.1">NW_003571054.1</a></td>
|
||
<td>ALT_REF_LOCI_1</td>
|
||
<td>COX1</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/GL949747.2">GL949747.2</a></td>
|
||
<td><a href="/nuccore/NW_003571055.2">NW_003571055.2</a></td>
|
||
<td>ALT_REF_LOCI_2</td>
|
||
<td>COX2</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/GL949748.2">GL949748.2</a></td>
|
||
<td><a href="/nuccore/NW_003571056.2">NW_003571056.2</a></td>
|
||
<td>ALT_REF_LOCI_3</td>
|
||
<td>LRC_i</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/GL949749.2">GL949749.2</a></td>
|
||
<td><a href="/nuccore/NW_003571057.2">NW_003571057.2</a></td>
|
||
<td>ALT_REF_LOCI_4</td>
|
||
<td>LRC_j</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/GL949750.2">GL949750.2</a></td>
|
||
<td><a href="/nuccore/NW_003571058.2">NW_003571058.2</a></td>
|
||
<td>ALT_REF_LOCI_5</td>
|
||
<td>LRC_s</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/GL949751.2">GL949751.2</a></td>
|
||
<td><a href="/nuccore/NW_003571059.2">NW_003571059.2</a></td>
|
||
<td>ALT_REF_LOCI_6</td>
|
||
<td>LRC_t</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/GL949752.1">GL949752.1</a></td>
|
||
<td><a href="/nuccore/NW_003571060.1">NW_003571060.1</a></td>
|
||
<td>ALT_REF_LOCI_7</td>
|
||
<td>PGF1</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/GL949753.2">GL949753.2</a></td>
|
||
<td><a href="/nuccore/NW_003571061.2">NW_003571061.2</a></td>
|
||
<td>ALT_REF_LOCI_8</td>
|
||
<td>PGF2</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KI270938.1">KI270938.1</a></td>
|
||
<td><a href="/nuccore/NT_187693.1">NT_187693.1</a></td>
|
||
<td>ALT_REF_LOCI_9</td>
|
||
<td>mixed?</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KI270882.1">KI270882.1</a></td>
|
||
<td><a href="/nuccore/NT_187636.1">NT_187636.1</a></td>
|
||
<td>ALT_REF_LOCI_10</td>
|
||
<td>FH15_B</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KI270883.1">KI270883.1</a></td>
|
||
<td><a href="/nuccore/NT_187637.1">NT_187637.1</a></td>
|
||
<td>ALT_REF_LOCI_11</td>
|
||
<td>G085_A</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KI270884.1">KI270884.1</a></td>
|
||
<td><a href="/nuccore/NT_187638.1">NT_187638.1</a></td>
|
||
<td>ALT_REF_LOCI_12</td>
|
||
<td>G085_BA1</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KI270885.1">KI270885.1</a></td>
|
||
<td><a href="/nuccore/NT_187639.1">NT_187639.1</a></td>
|
||
<td>ALT_REF_LOCI_13</td>
|
||
<td>G248_A</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KI270886.1">KI270886.1</a></td>
|
||
<td><a href="/nuccore/NT_187640.1">NT_187640.1</a></td>
|
||
<td>ALT_REF_LOCI_14</td>
|
||
<td>G248_BA2</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KI270887.1">KI270887.1</a></td>
|
||
<td><a href="/nuccore/NT_187641.1">NT_187641.1</a></td>
|
||
<td>ALT_REF_LOCI_15</td>
|
||
<td>GRC212_AB</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KI270888.1">KI270888.1</a></td>
|
||
<td><a href="/nuccore/NT_187642.1">NT_187642.1</a></td>
|
||
<td>ALT_REF_LOCI_16</td>
|
||
<td>GRC212_BA1</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KI270889.1">KI270889.1</a></td>
|
||
<td><a href="/nuccore/NT_187643.1">NT_187643.1</a></td>
|
||
<td>ALT_REF_LOCI_17</td>
|
||
<td>LUCE_A</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KI270890.1">KI270890.1</a></td>
|
||
<td><a href="/nuccore/NT_187644.1">NT_187644.1</a></td>
|
||
<td>ALT_REF_LOCI_18</td>
|
||
<td>LUCE_Bdel</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KI270891.1">KI270891.1</a></td>
|
||
<td><a href="/nuccore/NT_187645.1">NT_187645.1</a></td>
|
||
<td>ALT_REF_LOCI_19</td>
|
||
<td>RSH_A</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KI270914.1">KI270914.1</a></td>
|
||
<td><a href="/nuccore/NT_187668.1">NT_187668.1</a></td>
|
||
<td>ALT_REF_LOCI_20</td>
|
||
<td>RSH_BA2</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KI270915.1">KI270915.1</a></td>
|
||
<td><a href="/nuccore/NT_187669.1">NT_187669.1</a></td>
|
||
<td>ALT_REF_LOCI_21</td>
|
||
<td>T7526_A</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KI270916.1">KI270916.1</a></td>
|
||
<td><a href="/nuccore/NT_187670.1">NT_187670.1</a></td>
|
||
<td>ALT_REF_LOCI_22</td>
|
||
<td>T7526_Bdel</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KI270917.1">KI270917.1</a></td>
|
||
<td><a href="/nuccore/NT_187671.1">NT_187671.1</a></td>
|
||
<td>ALT_REF_LOCI_23</td>
|
||
<td>ABC08_A</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KI270918.1">KI270918.1</a></td>
|
||
<td><a href="/nuccore/NT_187672.1">NT_187672.1</a></td>
|
||
<td>ALT_REF_LOCI_24</td>
|
||
<td>ABC08_AB</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KI270919.1">KI270919.1</a></td>
|
||
<td><a href="/nuccore/NT_187673.1">NT_187673.1</a></td>
|
||
<td>ALT_REF_LOCI_25</td>
|
||
<td>ABC08_AB</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KI270920.1">KI270920.1</a></td>
|
||
<td><a href="/nuccore/NT_187674.1">NT_187674.1</a></td>
|
||
<td>ALT_REF_LOCI_26</td>
|
||
<td>FH05_A</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KI270921.1">KI270921.1</a></td>
|
||
<td><a href="/nuccore/NT_187675.1">NT_187675.1</a></td>
|
||
<td>ALT_REF_LOCI_27</td>
|
||
<td>FH05_B</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KI270922.1">KI270922.1</a></td>
|
||
<td><a href="/nuccore/NT_187676.1">NT_187676.1</a></td>
|
||
<td>ALT_REF_LOCI_28</td>
|
||
<td>FH06_A</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KI270923.1">KI270923.1</a></td>
|
||
<td><a href="/nuccore/NT_187677.1">NT_187677.1</a></td>
|
||
<td>ALT_REF_LOCI_29</td>
|
||
<td>FH06_BA1</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KI270929.1">KI270929.1</a></td>
|
||
<td><a href="/nuccore/NT_187683.1">NT_187683.1</a></td>
|
||
<td>ALT_REF_LOCI_30</td>
|
||
<td>FH08_A</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KI270930.1">KI270930.1</a></td>
|
||
<td><a href="/nuccore/NT_187684.1">NT_187684.1</a></td>
|
||
<td>ALT_REF_LOCI_31</td>
|
||
<td>FH08_BAX</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KI270931.1">KI270931.1</a></td>
|
||
<td><a href="/nuccore/NT_187685.1">NT_187685.1</a></td>
|
||
<td>ALT_REF_LOCI_32</td>
|
||
<td>FH13_A</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KI270932.1">KI270932.1</a></td>
|
||
<td><a href="/nuccore/NT_187686.1">NT_187686.1</a></td>
|
||
<td>ALT_REF_LOCI_33</td>
|
||
<td>FH13_BA2</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KI270933.1">KI270933.1</a></td>
|
||
<td><a href="/nuccore/NT_187687.1">NT_187687.1</a></td>
|
||
<td>ALT_REF_LOCI_34</td>
|
||
<td>FH15_A</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/GL000209.2">GL000209.2</a></td>
|
||
<td><a href="/nuccore/NT_113949.2">NT_113949.2</a></td>
|
||
<td>ALT_REF_LOCI_35</td>
|
||
<td>RP5_B</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KV575252.1%20">KV575252.1</a></td>
|
||
<td><a href="/nuccore/NW_016107306.1">NW_016107306.1</a></td>
|
||
<td>PATCHES</td>
|
||
<td>B05-tA01</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KV575253.1">KV575253.1</a></td>
|
||
<td><a href="/nuccore/NW_016107307.1">NW_016107307.1</a></td>
|
||
<td>PATCHES</td>
|
||
<td>A01-tA01</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KV575246.1">KV575246.1</a></td>
|
||
<td><a href="/nuccore/NW_016107300.1">NW_016107300.1</a></td>
|
||
<td>PATCHES</td>
|
||
<td>A01-tA01</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KV575251.1">KV575251.1</a></td>
|
||
<td><a href="/nuccore/NW_016107305.1">NW_016107305.1</a></td>
|
||
<td>PATCHES</td>
|
||
<td>A01-tA01</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KV575255.1">KV575255.1</a></td>
|
||
<td><a href="/nuccore/NW_016107309.1">NW_016107309.1</a></td>
|
||
<td>PATCHES</td>
|
||
<td>A01-tA01</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KV575259.1">KV575259.1</a></td>
|
||
<td><a href="/nuccore/NW_016107313.1">NW_016107313.1</a></td>
|
||
<td>PATCHES</td>
|
||
<td>A01-tA01</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KV575247.1">KV575247.1</a></td>
|
||
<td><a href="/nuccore/NW_016107301.1">NW_016107301.1</a></td>
|
||
<td>PATCHES</td>
|
||
<td>A01-tA01</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KV575248.1">KV575248.1</a></td>
|
||
<td><a href="/nuccore/NW_016107302.1">NW_016107302.1</a></td>
|
||
<td>PATCHES</td>
|
||
<td>A01-tA01</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KV575256.1">KV575256.1</a></td>
|
||
<td><a href="/nuccore/NW_016107310.1">NW_016107310.1</a></td>
|
||
<td>PATCHES</td>
|
||
<td>B01-tB01</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KV575258.1%20">KV575258.1</a></td>
|
||
<td><a href="/nuccore/NW_016107312.1">NW_016107312.1</a></td>
|
||
<td>PATCHES</td>
|
||
<td>B04-tB03</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KV575254.1">KV575254.1</a></td>
|
||
<td><a href="/nuccore/NW_016107308.1">NW_016107308.1</a></td>
|
||
<td>PATCHES</td>
|
||
<td>A03-tB02</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KV575260.1">KV575260.1</a></td>
|
||
<td><a href="/nuccore/NW_016107314.1">NW_016107314.1</a></td>
|
||
<td>PATCHES</td>
|
||
<td>B02-tA01</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KV575249.1">KV575249.1</a></td>
|
||
<td><a href="/nuccore/NW_016107303.1">NW_016107303.1</a></td>
|
||
<td>PATCHES</td>
|
||
<td>A01-tB04</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KV575250.1">KV575250.1</a></td>
|
||
<td><a href="/nuccore/NW_016107304.1">NW_016107304.1</a></td>
|
||
<td>PATCHES</td>
|
||
<td>A01-tB01</td>
|
||
</tr>
|
||
<tr>
|
||
<td><a href="/nucleotide/KV575257.1">KV575257.1</a></td>
|
||
<td><a href="/nuccore/NW_016107311.1">NW_016107311.1</a></td>
|
||
<td>PATCHES</td>
|
||
<td>A04</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
|
||
<p><span id="format-reference-data-for-read-alignment"/></p>
|
||
|
||
<p><strong>Can I get reference assembly data sets formatted for use by sequence read alignment pipelines?</strong></p>
|
||
|
||
<p>The GenBank FTP site provides assembly data for GRCh37.p13, GRCh38 and GRCm38 that are formatted and packaged for use with tools in several common sequence analysis pipelines, including BWA, Samtools and Bowtie. Known as analysis sets, the various data packages include copies of the assemblies with and without alternate loci scaffolds, with and without additional sequences commonly used as alignment targets, such as chr. EBV and the GRCh37 (<a href="https://media.nature.com/full/nature-assets/nature/journal/v526/n7571/extref/nature15393-s1.pdf">https://media.nature.com/full/nature-assets/nature/journal/v526/n7571/extref/nature15393-s1.pdf</a>, section 3.6.1) and <a href="https://www.ncbi.nlm.nih.gov/assembly/GCA_000786075.2">GRCh38 decoys</a> constructed by Heng Li, and with masking of genomic regions such as PAR and the centromeres. For complete information, see the README file provided with each analysis set. </p>
|
||
|
||
<p>Links to analysis sets:</p>
|
||
|
||
<p><a href="https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/">Human genome assembly GRCh38</a></p>
|
||
|
||
<p><a href="https://ftp.ncbi.nlm.nih.gov/genomes/archive/old_genbank/Eukaryotes/vertebrates_mammals/Homo_sapiens/GRCh37.p13/">Human genome assembly GRCh37.p13</a></p>
|
||
|
||
<p><a href="https://ftp.ncbi.nlm.nih.gov/genomes/archive/old_genbank/Eukaryotes/vertebrates_mammals/Mus_musculus/GRCm38.p3/seqs_for_alignment_pipelines/">Mouse genome assembly GRCm38.p3</a> </p>
|
||
|
||
<p><span id="human-reference-genome-and-common-alleles"/></p>
|
||
|
||
<p><strong>Does the human reference genome represent common/major allele at all chromosomal loci?</strong></p>
|
||
|
||
<p>The human reference genome is a composite genome, derived from the genomes of several different individuals. The reference assembly chromosomes overwhelmingly represent the alleles found in their underlying component sequences, which are derived from these various DNA sources (see FAQ <a href="#dna-source-of-human-reference-genome">Where can I get information about the DNA sources for the human reference genome?</a>. As part of its curation effort, the GRC strives to ensure that the reference genome represents alleles that are not unique to a specific individual or universally rare, but are commonly found in 1 or more populations. Some loci from GRCh37 were updated in GRCh38 as part of this work (see <a href="https://www.biorxiv.org/content/early/2016/08/30/072116">“Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly”</a>). As sequence from more humans, representing even more populations, becomes available, we continue with this effort.</p>
|
||
|
||
<p>It should be noted that in some instances, the most “common” allele is neither the longest nor the ancestral allele, two other reference representations that are often requested of the GRC. The GRC uses alternate loci scaffolds to provide additional sequence representations for diverse genomic regions. By including multiple representations of such loci, the human reference genome assembly is better able to represent population genomic diversity. Thus, in some instances, the most “common”, longest or ancestral allele may be found on an alternate loci scaffold instead of the chromosome. For additional information on alternate loci, see the FAQ <a href="#difference-between-alternate-loci-and-novel-patch">What are alternate loci and novel patches?</a>.</p>
|
||
|
||
<p><span id="reference-assembly-method"/></p>
|
||
|
||
<p><strong>What assembly method was used to create the reference assembly?</strong></p>
|
||
|
||
<p>The reference assembly is distinguished from most other human assemblies by virtue of being a clone-based assembly comprised of DNA from multiple individuals, rather than a whole genome shotgun assembly of a single individual. As a result, each chromosome assembly is a haploid mosaic, rather than a haploid consensus, in which valid haplotypes may transition at clone boundaries. For additional information, please see:
|
||
<a href="https://www.ncbi.nlm.nih.gov/pubmed/15496913">Finishing the euchromatic sequence of the human genome</a></p>
|
||
|
||
<p><span id="grc-assembly-resources"/></p>
|
||
|
||
<p><strong>What types of assembly resources are available from the GRC?</strong></p>
|
||
|
||
<p>The <a href="https://www.ncbi.nlm.nih.gov/grc">GRC webpage</a> provides users with a summary of all assembly regions under review, announcements about assembly release plans, links to download the assembly data and assembly statistics. A GRC blog discusses recent curation events and highlights genomic regions of interest. On the GRC website, organism-specific data (<a href="https://www.ncbi.nlm.nih.gov/grc/human">Human</a>, <a href="https://www.ncbi.nlm.nih.gov/grc/mouse">Mouse</a>, <a href="https://www.ncbi.nlm.nih.gov/grc/zebrafish">Zebrafish</a>, and <a href="https://www.ncbi.nlm.nih.gov/grc/chicken">Chicken</a>) are provided under separate tabs:</p>
|
||
|
||
<p><a href="https://www.ncbi.nlm.nih.gov/grc/human/issues">https://www.ncbi.nlm.nih.gov/grc/human/issues</a></p>
|
||
|
||
<p><a href="https://www.ncbi.nlm.nih.gov/grc/mouse/issues">https://www.ncbi.nlm.nih.gov/grc/mouse/issues</a></p>
|
||
|
||
<p><a href="https://www.ncbi.nlm.nih.gov/grc/zebrafish/issues">https://www.ncbi.nlm.nih.gov/grc/zebrafish/issues</a></p>
|
||
|
||
<p><a href="https://www.ncbi.nlm.nih.gov/grc/chicken/issues">https://www.ncbi.nlm.nih.gov/grc/chicken/issues</a></p>
|
||
|
||
<p>Assembly regions under GRC curation can also be viewed in tracks in several common genome browsers.
|
||
A GRC-provided <a href="http://ngs.sanger.ac.uk/production/grit/track_hub/hub.txt">Track Hub</a> is available in the <a href="http://www.ensembl.org/index.html">Ensembl</a> and <a href="https://genome.ucsc.edu/">UCSC</a> browsers. Example of Individual tracks available in the GRC hub are tracks for GRC curation issues, alignments between the primary assembly and alternate loci and patches, optical mapping data and clone sequence anomalies.</p>
|
||
|
||
<p>At NCBI, the “Assembly Support” track set, accessed via the “Tracks” menu in the <a href="https://www.ncbi.nlm.nih.gov/genome/gdv/">Genome Data Viewer</a> (GDV), 1000 Genomes and Variation Viewer browsers provides many of the same tracks as the GRC track hub. For more information on using NCBI Track Sets, see <a href="https://www.youtube.com/watch?v=Q9kOLBHZR4s">https://www.youtube.com/watch?v=Q9kOLBHZR4s</a> and <a href="https://www.ncbi.nlm.nih.gov/tools/sviewer/faq/#tracksets">https://www.ncbi.nlm.nih.gov/tools/sviewer/faq/#tracksets</a>. </p>
|
||
|
||
<p><span id="issue-type"/>
|
||
<strong>What are the different types of GRC curation issues?</strong></p>
|
||
|
||
<p>The GRC has defined the following categories for curation issues: </p>
|
||
|
||
<ul>
|
||
<li>Clone Problem: Issue is related to a specific clone</li>
|
||
<li>Variation: No error, problem is related to biological variation</li>
|
||
<li>Path Problem: Issue is related to a tiling path problem</li>
|
||
<li>Localization Problem: Issue is related to a sequence localization (unlocalized or unplaced)</li>
|
||
<li>Missing sequence: Issue is related to sequence that is missing from assembly</li>
|
||
<li>Gap: Issue is related to a specific assembly gap (inter- or intra-scaffold)</li>
|
||
<li>GRC Housekeeping: For general assembly improvements issues not affiliated with reported assembly errors. These include, but are not limited to: YAC replacements (not associated with clone problem), switch point updates not associated with clone or path problems</li>
|
||
<li>Unknown: Issue type is unknown </li>
|
||
</ul>
|
||
|
||
<p><span id="issue-status"/></p>
|
||
|
||
<p><strong>What is meant by the different statuses for GRC curation issues?</strong></p>
|
||
|
||
<p>The progress of an issue in the GRC curation workflow is defined by its status: </p>
|
||
|
||
<ul>
|
||
<li>Open: New issue, not yet reviewed</li>
|
||
<li>Under Review: Issue has undergone initial review, and it has been determined that work is required</li>
|
||
<li>Awaiting Elec Data: Work on issue has started, and is awaiting electronic data from trace analysis, alignment review, Genome Workbench analysis, PGP Viewer analysis, optical map (OM) review, etc.</li>
|
||
<li>Awaiting Exptl Data: Work on issue has started, and is awaiting experimental data from PCR, clone sequencing and/or mapping</li>
|
||
<li>Awaiting External Info: Work on issue has started, and an external (non-GRC) consultation has been initiated</li>
|
||
<li>Continuing Investigation: Issue requires further work</li>
|
||
<li>Resolved: Issue has been addressed, and all relevant data changes have been submitted to the GRC database</li>
|
||
<li>Reopened: Issue was previously resolved, but review indicates the problem has not been corrected</li>
|
||
<li>Stalled: Issue has been reviewed and determined to be unresolvable with current technologies</li>
|
||
</ul>
|
||
|
||
<p><span id="genes-annotations-in-reference-genome"/></p>
|
||
|
||
<p><strong>Where can I find gene content and annotations of the genome reference assembly?</strong></p>
|
||
|
||
<p>The GRC produces and curates reference assemblies, but does not perform gene annotation.
|
||
You can obtain the most recent RefSeq annotation by the NCBI Genome annotation pipeline for <a href="https://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/GFF/">Human</a> (GRCh38), <a href="https://ftp.ncbi.nlm.nih.gov/genomes/Mus_musculus/GFF/">Mouse</a> (GRCm38), <a href="https://ftp.ncbi.nlm.nih.gov/genomes/Danio_rerio/GFF/">Zebrafish</a> (GRCz11), and <a href="https://ftp.ncbi.nlm.nih.gov/genomes/Gallus_gallus/GFF/">Chicken</a> (GRCg6a).
|
||
GENCODE offers annotations for <a href="https://www.gencodegenes.org/releases/current.html">Human</a> and <a href="https://www.gencodegenes.org/mouse_releases/current.html">Mouse</a>, while Ensembl annotates zebrafish and chicken.
|
||
UCSC provides annotations for <a href="http://hgdownload.soe.ucsc.edu/downloads.html#human">Human</a>, <a href="http://hgdownload.soe.ucsc.edu/downloads.html#mouse">Mouse</a>, <a href="http://hgdownload.soe.ucsc.edu/downloads.html#zebrafish">Zebrafish</a> and <a href="http://hgdownload.soe.ucsc.edu/downloads.html#chicken">Chicken</a>. </p>
|
||
|
||
<p><span id="ribosomal-dna-in-reference"/></p>
|
||
|
||
<p><strong>Does the human reference assembly contain representation for ribosomal DNA sequences?</strong></p>
|
||
|
||
<p>Due to the highly repetitive nature of the 5S rDNA cluster on 1q42, and the 45S cluster on the p-arms of the acrocentric chromosomes, we are unable to provide a complete, biologically accurate, representation for these regions in the human reference assembly with currently available resources. However, the GRCh38 reference assembly does provide a representation for a limited number of repeat copies in each cluster.
|
||
Sequence from the 5S cluster (representing ~19 copies) is located between NC_000001.11 (GenBank accession: CM000663.2): 228,408,802-228-664,283. We recognize that this is a gross under-representation of this cluster, which is estimated to occur at ~100 repeats (<a href="https://www.ncbi.nlm.nih.gov/pubmed/?term=18025267">PMID: 18025267</a>) and will address this as resources become available.
|
||
Sequence from the 45S cluster on the acrocentric p-arms is found on the unplaced scaffold NT_167214.1 (GenBank accession: GL000220.1). This includes ~1.5 copies of the 45S cluster. At this time, we do not know from which of the acrocentric chromosomes this sequence derives.
|
||
You can track ongoing work for the 45S (<a href="https://www.ncbi.nlm.nih.gov/grc/human/issues/HG-1101">HG-1101</a>) and 5S (<a href="https://www.ncbi.nlm.nih.gov/grc/human/issues/HG-2002">HG-2002</a>) regions at the GRC website. </p>
|
||
|
||
<p>In the GRCm38 assembly, the Rn45s 45S pre-ribosomal RNA is not annotated, but there is a related sequence (Rn18s-rs5) located on chromosome 17. You can track GRC ongoing work on the issue related to Rn45s 45S (<a href="https://www.ncbi.nlm.nih.gov/grc/mouse/issues/MG-4232">MG-4232</a>). We recognize the importance of providing accurate representations for these important clusters and are continuing to look at new technologies (e.g. long read sequencing, optical maps) that may help us improve the reference for these regions.</p>
|
||
|
||
<p><span id="grc-issues"/></p>
|
||
|
||
<p><strong>How do I find the latest data about reference assembly problems?</strong></p>
|
||
|
||
<p>The latest information about assembly problems, ongoing work and other curation issues related to GRC-managed genome assemblies are available on the GRC website:</p>
|
||
|
||
<p>Human: <a href="https://www.ncbi.nlm.nih.gov/grc/human/issues">https://www.ncbi.nlm.nih.gov/grc/human/issues</a></p>
|
||
|
||
<p>Mouse: <a href="https://www.ncbi.nlm.nih.gov/grc/mouse/issues">https://www.ncbi.nlm.nih.gov/grc/mouse/issues</a></p>
|
||
|
||
<p>Zebrafish: <a href="https://www.ncbi.nlm.nih.gov/grc/zebrafish/issues">https://www.ncbi.nlm.nih.gov/grc/zebrafish/issues</a></p>
|
||
|
||
<p>Chicken: <a href="https://www.ncbi.nlm.nih.gov/grc/chicken/issues">https://www.ncbi.nlm.nih.gov/grc/chicken/issues</a></p>
|
||
|
||
<p>You can search for issues and regions under review. The GRC provides a brief description of each issue, its resolution status and its mapping to the current and previous reference assembly versions. For more details, see the “<a href="http://genomeref.blogspot.com/2015/02/grc-website-update-genome-issues-under.html">Genome Issues Under Review</a>” and “<a href="http://genomeref.blogspot.com/2015/02/grc-website-individual-genome-issues.html">Individual Genome Issues</a>” posts on the <a href="http://genomeref.blogspot.com/">GRC blog</a>.</p>
|
||
|
||
<p><span id="reporting-assembly-problems"/></p>
|
||
|
||
<p><strong>How can I report an assembly problem?</strong></p>
|
||
|
||
<p>If you find an error in the assembly or a variant region that needs to be better presented, please let us know at “<a href="https://www.ncbi.nlm.nih.gov/grc/report-an-issue">Report an Issue</a>”. If you have a question for GRC, we welcome your comments at “<a href="https://www.ncbi.nlm.nih.gov/grc/contact-us">Contact us</a>”. </p>
|
||
|
||
<p><span id="how-to-cite"/></p>
|
||
|
||
<p><strong>How to cite the GRC or a reference assembly?</strong></p>
|
||
|
||
<p>You can cite the <a href="https://www.ncbi.nlm.nih.gov/grc">GRC website</a> or the articles “<a href="https://www.ncbi.nlm.nih.gov/pubmed/21750661">Modernizing reference genome assemblies</a>” (for GRCh37) and “<a href="https://www.biorxiv.org/content/early/2016/08/30/072116">Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly</a>” (for GRCh38).</p>
|
||
|
||
<hr/>
|
||
|
||
<p/>
|
||
|
||
<h4 id="bioinformatics-tools_1">Bioinformatics Tools</h4>
|
||
|
||
<p><span id="mapping-between-releases-assembly-alignments-or-converting-reference-and-other-assemblies"/></p>
|
||
|
||
<p><strong>What tools can I use to map/convert data between different releases of the reference assembly or between the reference and other genome assemblies?</strong></p>
|
||
|
||
<p>UCSC offers the <a href="http://genome.ucsc.edu/cgi-bin/hgLiftOver">LiftOver</a> tool, which converts genome coordinates and genome annotation files between assemblies. </p>
|
||
|
||
<p>Ensembl offers the <a href="http://useast.ensembl.org/Homo_sapiens/Tools/AssemblyConverter?db=core">Assembly Converter</a> to convert coordinates between different releases of one genome assembly to another.</p>
|
||
|
||
<p>Remapping results are straightforward and identical for regions of genome which align well and are without complications of repeats or structural variation, but can be problematic for complicated genomic regions such as duplicated or collapsed regions between old and new assemblies, or newly added regions, at which there may be reads missing or mis-mapped in the older assembly. Since all remapping tools are limited by their reliance on alignment between old and new assemblies, in such regions de novo read mapping will likely be more accurate than remapping. </p>
|
||
|
||
|
||
</div>
|
||
<script>
|
||
if (jQuery) {
|
||
jQuery(function(){
|
||
(function($){
|
||
$('#grc-cms-content table').ncbigrid();
|
||
})(jQuery);
|
||
})
|
||
}
|
||
</script>
|
||
|
||
</div>
|
||
</div>
|
||
<div class="cleaner"></div>
|
||
<div id="footer">
|
||
<ul>
|
||
<li><a title="Get GRC data via FTP" href="https://ftp.ncbi.nlm.nih.gov/pub/grc/">FTP</a></li>
|
||
<li><a target="_blank" href="https://www.genome.gov">NHGRI</a></li>
|
||
<li><a target="_blank" href="http://www.wellcome.ac.uk">Wellcome Sanger Institute</a></li>
|
||
<li><a target="_blank" href="https://www.hhs.gov">HHS</a></li>
|
||
<li><a target="_blank" href="https://www.nih.gov">NIH</a></li>
|
||
<li><a target="_blank" href="https://www.nih.gov/web-policies-notices">Accessibility</a></li>
|
||
<li class="last"><a target="_blank" href="https://www.hhs.gov/vulnerability-disclosure-policy/index.html">HHS Vulnerability Disclosure</a></li>
|
||
</ul>
|
||
</div>
|
||
</div>
|
||
</div><!-- end page -->
|
||
|
||
<script type="text/javascript" src="/portal/portal3rc.fcgi/rlib/js/InstrumentNCBIBaseJS/InstrumentPageStarterJS.js"></script>
|
||
</body>
|
||
</html>
|