Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Feb;84(2):148-61.
doi: 10.1016/j.ajhg.2008.12.014. Epub 2009 Jan 22.

Population analysis of large copy number variants and hotspots of human genetic disease

Affiliations

Population analysis of large copy number variants and hotspots of human genetic disease

Andy Itsara et al. Am J Hum Genet. 2009 Feb.

Abstract

Copy number variants (CNVs) contribute to human genetic and phenotypic diversity. However, the distribution of larger CNVs in the general population remains largely unexplored. We identify large variants in approximately 2500 individuals by using Illumina SNP data, with an emphasis on "hotspots" prone to recurrent mutations. We find variants larger than 500 kb in 5%-10% of individuals and variants greater than 1 Mb in 1%-2%. In contrast to previous studies, we find limited evidence for stratification of CNVs in geographically distinct human populations. Importantly, our sample size permits a robust distinction between truly rare and polymorphic but low-frequency copy number variation. We find that a significant fraction of individual CNVs larger than 100 kb are rare and that both gene density and size are strongly anticorrelated with allele frequency. Thus, although large CNVs commonly exist in normal individuals, which suggests that size alone can not be used as a predictor of pathogenicity, such variation is generally deleterious. Considering these observations, we combine our data with published CNVs from more than 12,000 individuals contrasting control and neurological disease collections. This analysis identifies known disease loci and highlights additional CNVs (e.g., 3q29, 16p12, and 15q25.2) for further investigation. This study provides one of the first analyses of large, rare (0.1%-1%) CNVs in the general population, with insights relevant to future analyses of genetic disease.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Examples of CNVs by Location and Type Typical examples of duplications (top row), heterozygous deletions (middle row), and homozygous deletions (bottom row) as detected by using SNP arrays classified as rearrangement hotspot mediated (A), hotspot associated (B), or nonhotspot (C) (see Material and Methods for definitions). The plots show LogR ratio (vertical bars), b-allele frequency (solid points), segmental duplications in the reference assembly (green blocks), and the locations of rearrangement hotspots (purple brackets). CNVs are highlighted by gray rectangles, contrasting the LogR ratio (red) and b-allele frequency (blue) with flanking regions (black). Duplications are characterized by increased LogR ratio and heterozygous b-allele frequencies in multiple clusters, corresponding to “AAB” and “ABB” SNP genotypes, instead of a single cluster at 0.5 (“AB”). Heterozygous deletions have decreased LogR ratio and display a loss of heterozygosity. Homozygous deletions have an extremely low LogR ratio and display b-allele frequencies that fail to cluster.
Figure 2
Figure 2
Autosomal Landscape of Large CNVs Large CNVs are >100 kbp. Duplications (blue), deletions (red), and homozygous deletions (black) are depicted based on analysis of 2493 individuals. Chromosomes are drawn to scale (tick marks indicate 10 Mb), with the position of centromeres (gray) and predicted rearrangement hotspots (green lines connected by a diagonal) indicated. Those hotspots associated with disease are highlighted in purple. CNVs observed ten or more times for a given locus are cropped.
Figure 3
Figure 3
Cumulative Distributions of the Largest CNV per Individual According to Study For 10 kb to 1 Mb in 10 kb intervals, the fraction of individuals containing one or more CNVs (y axis) of size greater or equal to a given size (x axis) is plotted according to study. Note that probe density has a significant impact at smaller CNV sizes, but that the cumulative distributions for blood-derived (PARC-PRINCE) and cell-line (PARC-CAP) DNA are similar. The average number of CNVs per individual varies by study from 3 to 7 (Figure S2).
Figure 4
Figure 4
CNV Length, Gene Content, and Frequency Distributions CNVs were plotted according to event type (color), length (y axis), frequency in the population (x axis, number of individuals from n = 2493), and number of RefSeq genes affected (circle size). To facilitate comparison across different platforms, events from different individuals were considered the same if their putative breakpoints were within 50 kb of one another. CNVs related to previously reported disease-causing variants are highlighted.
Figure 5
Figure 5
Comparison of CNVs >100 kb in Affected versus Unaffected Individuals at Four Selected Loci Scoring Highly for Potential Pathogenicity Duplications, deletions, and homozygous deletions are plotted blue, red, and black, respectively, in human reference assembly coordinates (x axis in each plot). Tick marks are spaced 10 Mb apart, centromeres are indicated in gray, and hotspots are shown as two green vertical lines connected by a green diagonal. Scale in bottom right indicates 1 Mb. Rearrangement hotspots that have been associated with disease are highlighted in purple. Plotting is cropped after 30 overlapping CNVs at a given locus. (A and B) Known disease loci. (A) 22q11-12. Disease hotspots (left to right): VCFS, critical region; VCFS, distal region, Distal 22q11 deletion syndrome (MIM 611867). (B) 15q11-q14. Disease hotspots: Prader-Willi/Angelman Syndrome BP1-BP3 (MIM 176270, 105830), and 15q13.3 (MIM 612001). (C and D) Candidate disease loci. (C) 16p11-13. An inversion-containing region found in 7/8 analyzed HapMap samples has been colored orange along the x axis. Disease hotspots from left to right: 16p13 deletion syndrome distal and proximal regions, 16p11.2-p12.2 deletion syndrome, and 16p11 region associated with autism. (D) 15q22-25. Disease hotspots from left to right: 15q24 deletion syndrome BP0-BP1, BP1-BP2, and BP2-BP3.

Similar articles

Cited by

References

    1. Iafrate A.J., Feuk L., Rivera M.N., Listewnik M.L., Donahoe P.K., Qi Y., Scherer S.W., Lee C. Detection of large-scale variation in the human genome. Nat. Genet. 2004;36:949–951. - PubMed
    1. Redon R., Ishikawa S., Fitch K.R., Feuk L., Perry G.H., Andrews T.D., Fiegler H., Shapero M.H., Carson A.R., Chen W. Global variation in copy number in the human genome. Nature. 2006;444:444–454. - PMC - PubMed
    1. Sebat J., Lakshmi B., Troge J., Alexander J., Young J., Lundin P., Maner S., Massa H., Walker M., Chi M. Large-scale copy number polymorphism in the human genome. Science. 2004;305:525–528. - PubMed
    1. Sharp A.J., Locke D.P., McGrath S.D., Cheng Z., Bailey J.A., Vallente R.U., Pertz L.M., Clark R.A., Schwartz S., Segraves R. Segmental duplications and copy-number variation in the human genome. Am. J. Hum. Genet. 2005;77:78–88. - PMC - PubMed
    1. Tuzun E., Sharp A.J., Bailey J.A., Kaul R., Morrison V.A., Pertz L.M., Haugen E., Hayden H., Albertson D., Pinkel D. Fine-scale structural variation of the human genome. Nat. Genet. 2005;37:727–732. - PubMed

Publication types