Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct;586(7831):757-762.
doi: 10.1038/s41586-020-2832-5. Epub 2020 Oct 14.

Evidence for 28 genetic disorders discovered by combining healthcare and research data

Collaborators, Affiliations

Evidence for 28 genetic disorders discovered by combining healthcare and research data

Joanna Kaplanis et al. Nature. 2020 Oct.

Abstract

De novo mutations in protein-coding genes are a well-established cause of developmental disorders1. However, genes known to be associated with developmental disorders account for only a minority of the observed excess of such de novo mutations1,2. Here, to identify previously undescribed genes associated with developmental disorders, we integrate healthcare and research exome-sequence data from 31,058 parent-offspring trios of individuals with developmental disorders, and develop a simulation-based statistical test to identify gene-specific enrichment of de novo mutations. We identified 285 genes that were significantly associated with developmental disorders, including 28 that had not previously been robustly associated with developmental disorders. Although we detected more genes associated with developmental disorders, much of the excess of de novo mutations in protein-coding genes remains unaccounted for. Modelling suggests that more than 1,000 genes associated with developmental disorders have not yet been described, many of which are likely to be less penetrant than the currently known genes. Research access to clinical diagnostic datasets will be critical for completing the map of genes associated with developmental disorders.

PubMed Disclaimer

Conflict of interest statement

Competing interests

Z.Z., K.J.A., R.I.T., J.J., and K.R. are employees of GeneDx. J.J. and K.R. are shareholders of OPKO. M.E.H. is a co-founder of, consultant to, and holds shares in, Congenica Ltd, a genetics diagnostic company.

Figures

Extended Data Figure 1
Extended Data Figure 1. Exploring the remaining number of DD genes.
(a) Number of significant genes from downsampling full cohort and running DeNovoWEST’s enrichment test. (b) Results from modelling the likelihood of the observed distribution of de novo PTV mutations. This model varies the numbers of remaining haploinsufficient (HI) DD genes and PTV enrichment in those remaining genes. The 50% credible interval is shown in red and the 90% credible interval is shown in orange. Note that the median PTV enrichment in genes that are significant and known to operate via a loss-of-function mechanism (shown with an arrow) is 39.7.
Figure 1
Figure 1. Results of DeNovoWEST analysis.
(a) Comparison of p-values using the new method (DeNovoWEST) versus the previous method (mupit), run on the full cohort. Dashed lines indicate the threshold for genome-wide significance (one sided, Bonferroni correction). Point size is proportional to the number of nonsynonymous DNMs in our cohort (nsyn). The number of genes that fall into each quadrant are annotated. (b) The number of missense and PTV DNMs in the novel genes. Point size is proportional to the log10(-p-value) from analysis of the undiagnosed subset. Point colour corresponds to which test p-value was more significant: non-synonymous enrichment test in blue (pEnrich), missense enrichment and clustering test in red (pMEC). (c) The distribution of significant p-values from analysis of the undiagnosed subset for discordant and novel genes; p-values for consensus genes come from the full cohort analysis. The number of genes in each p-value bin is coloured by diagnostic gene group (n = 285 significant genes; one-sided p-values, Bonferroni corrected). Green represents the remaining fraction of cases expected to have a pathogenic de novo coding mutation and grey is the fraction of cases that are likely to be explained by other factors. (d) The fraction of cases (n = 31,058) with a nonsynonymous mutation in each diagnostic gene group. (e) The fraction of cases with a nonsynonymous mutation in each diagnostic gene group split by sex (n = 13,636 female and 17,422 male). In all panels, black, blue and orange represents consensus, discordant and novel genes respectively.
Figure 2
Figure 2. Properties of novel genes.
(a) The phenotypic similarity of patients with DNMs in novel and consensus genes. Random phenotypic similarity was calculated from random pairs of patients. Cases with DNMs in the same novel gene were less phenotypically similar than cases with DNMs in the same consensus gene (p = 2.3 × 10-11, two-sided Wilcoxon rank-sum test). (b) Comparison of properties of consensus (n = 380) and novel (n = 28) DD genes known to be differential between consensus and non-DD genes (95% bootstrapped confidence intervals shown).
Figure 3
Figure 3. Factors influencing power.
(a) PTV mutability is significantly lower (p = 4.6 × 10-68, two-sided Wilcox rank sum test) in genes that are not significantly DD-associated (blue) than in DD-associated genes (red). Median depicted with a black horizontal line. (b) Distribution of PTV enrichment in significant, likely haploinsufficient, genes by category (118 consensus, 23 discordant, 8 novel genes). Lower and upper hinges correspond to first and third quantiles. Median depicted by a horizontal grey line. The upper and lower whiskers extend 1.5 times the inter-quartile range. (c) Comparison of PTV enrichment in our cohort vs the PTV to synonymous ratio in gnomAD, for genes that are significantly PTV-enriched in our cohort (without variant weighting; n = 156 genes). PTV enrichment bins labelled with log10(enrichment). Dashed line indicates regression. Confidence intervals are 95% of the rate ratio. (d) Overall PTV enrichment across genes grouped by likelihood of presenting with a structural malformation on prenatal ultrasound (145 low, 65 medium, 6 low genes). PTV enrichment is significantly higher for genes with a low likelihood compared to other genes (p = 4.6 × 10-5, two-sided Poisson test). Poisson 95% confidence intervals shown.

Similar articles

Cited by

  • Federated analysis of autosomal recessive coding variants in 29,745 developmental disorder patients from diverse populations.
    Chundru VK, Zhang Z, Walter K, Lindsay SJ, Danecek P, Eberhardt RY, Gardner EJ, Malawsky DS, Wigdor EM, Torene R, Retterer K, Wright CF, Ólafsdóttir H, Guillen Sacoto MJ, Ayaz A, Akbeyaz IH, Türkdoğan D, Al Balushi AI, Bertoli-Avella A, Bauer P, Szenker-Ravi E, Reversade B, McWalter K, Sheridan E, Firth HV, Hurles ME, Samocha KE, Ustach VD, Martin HC. Chundru VK, et al. Nat Genet. 2024 Oct;56(10):2046-2053. doi: 10.1038/s41588-024-01910-8. Epub 2024 Sep 23. Nat Genet. 2024. PMID: 39313616 Free PMC article.
  • The genetic cause of neurodevelopmental disorders in 30 consanguineous families.
    Paracha SA, Nawaz S, Tahir Sarwar M, Shaheen A, Zaman G, Ahmed J, Shah F, Khwaja S, Jan A, Khan N, Kamal MA, Alam Q, Abbas S, Farman S, Waqas A, Alkathiri A, Hamadi A, Santoni F, Ullah N, Khalid B, Antonarakis SE, Fakhro KA, Umair M, Ansar M. Paracha SA, et al. Front Med (Lausanne). 2024 Aug 30;11:1424753. doi: 10.3389/fmed.2024.1424753. eCollection 2024. Front Med (Lausanne). 2024. PMID: 39281811 Free PMC article.
  • Epigenomic dysregulation correlates with arachnoid cyst formation and neurodevelopmental symptoms.
    Kundishora AJ, Kahle KT. Kundishora AJ, et al. Nat Med. 2023 Mar;29(3):541-542. doi: 10.1038/s41591-023-02239-1. Nat Med. 2023. PMID: 36932244 Free PMC article.
  • The frequency of somatic mutations in cancer predicts the phenotypic relevance of germline mutations.
    Draetta EL, Lazarević D, Provero P, Cittaro D. Draetta EL, et al. Front Genet. 2023 Jan 9;13:1045301. doi: 10.3389/fgene.2022.1045301. eCollection 2022. Front Genet. 2023. PMID: 36699457 Free PMC article.
  • Variability in Phelan-McDermid Syndrome in a Cohort of 210 Individuals.
    Nevado J, García-Miñaúr S, Palomares-Bralo M, Vallespín E, Guillén-Navarro E, Rosell J, Bel-Fenellós C, Mori MÁ, Milá M, Del Campo M, Barrúz P, Santos-Simarro F, Obregón G, Orellana C, Pachajoa H, Tenorio JA, Galán E, Cigudosa JC, Moresco A, Saleme C, Castillo S, Gabau E, Pérez-Jurado L, Barcia A, Martín MS, Mansilla E, Vallcorba I, García-Murillo P, Cammarata-Scalisi F, Gonçalves Pereira N, Blanco-Lago R, Serrano M, Ortigoza-Escobar JD, Gener B, Seidel VA, Tirado P, Lapunzina P; Spanish PMS Working Group. Nevado J, et al. Front Genet. 2022 Apr 12;13:652454. doi: 10.3389/fgene.2022.652454. eCollection 2022. Front Genet. 2022. PMID: 35495150 Free PMC article.

References

    1. Deciphering Developmental Disorders Study. Prevalence and architecture of de novo mutations in developmental disorders. Nature. 2017;542:433–438. - PMC - PubMed
    1. Martin HC, et al. Quantifying the contribution of recessive coding variation to developmental disorders. Science. 2018;362:1161–1164. - PMC - PubMed
    1. Kircher M, et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014;46:310–315. - PMC - PubMed
    1. Samocha KE, et al. Regional missense constraint improves variant deleteriousness prediction. bioRxiv. 2017 doi: 10.1101/148353. 148353. - DOI
    1. Kosmicki JA, et al. Refining the role of de novo protein-truncating variants in neurodevelopmental disorders by using population reference samples. Nat Genet. 2017;49:504–510. - PMC - PubMed

Publication types

MeSH terms