Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 15;15(1):5924.
doi: 10.1038/s41467-024-49782-0.

Large-scale whole-exome sequencing analyses identified protein-coding variants associated with immune-mediated diseases in 350,770 adults

Affiliations

Large-scale whole-exome sequencing analyses identified protein-coding variants associated with immune-mediated diseases in 350,770 adults

Liu Yang et al. Nat Commun. .

Abstract

The genetic contribution of protein-coding variants to immune-mediated diseases (IMDs) remains underexplored. Through whole exome sequencing of 40 IMDs in 350,770 UK Biobank participants, we identified 162 unique genes in 35 IMDs, among which 124 were novel genes. Several genes, including FLG which is associated with atopic dermatitis and asthma, showed converging evidence from both rare and common variants. 91 genes exerted significant effects on longitudinal outcomes (interquartile range of Hazard Ratio: 1.12-5.89). Mendelian randomization identified five causal genes, of which four were approved drug targets (CDSN, DDR1, LTA, and IL18BP). Proteomic analysis indicated that mutations associated with specific IMDs might also affect protein expression in other IMDs. For example, DXO (celiac disease-related gene) and PSMB9 (alopecia areata-related gene) could modulate CDSN (autoimmune hypothyroidism-, psoriasis-, asthma-, and Graves' disease-related gene) expression. Identified genes predominantly impact immune and biochemical processes, and can be clustered into pathways of immune-related, urate metabolism, and antigen processing. Our findings identified protein-coding variants which are the key to IMDs pathogenesis and provided new insights into tailored innovative therapies.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Guideline of the study.
The analytical workflow and key findings are presented in this figure. A Incorporating WES data of UKB, we first performed exome-wide analysis across 40 IMDs. Top left and top middle panel depicted exome-wide gene-based analysis and single-variant analysis, respectively. External replication, internal replication, and sensitive analysis were further performed (the top right panel). B Then, we estimated genetic overlap across IMDs (the second left panel), pairwise genetic correlations (the second middle panel), and pleiotropic effects (the second right panel) at exome level. C We further investigated the clinical implications of IMD-associated genes (the third panel). Panel one, quantifying longitudinal disease risks for putatively pathogenic variations; panel two and three, protein expression alteration contributed by genetic mutations and causal inference; panel four, proteomic-wide analysis between identified genes and 1464 proteins; panel five, retrieving the potential druggability of genes by querying databases. D Finally, we unraveled the underlying biological insights of IMD-associated genes (the last panel). Last left panel, the associations between the IMDs-associated genes with muti-omics traits; last medium panel, shared biological pathways among identified genes; last right panel, the expression of the identified genes in different cell types. WES whole-exome sequencing, UKB the United Kingdom Biobank, IMD immune-mediated disease, GWAS genome-wide association study, MR Mendelian randomization, PPI protein-protein interaction, MRI magnetic resonance imaging.
Fig. 2
Fig. 2. Exome-wide analysis of rare genetic variation for 40 IMDs in UKB.
A Multiple-trait Manhattan plot representing the results from exome-wide gene-based tests for each IMDs. The red dotted line indicates the significance threshold at 2.5 × 106. SKAT P-values are two-sided and unadjusted. The significance threshold is set at FDR-corrected P-value < 0.05 for multiple comparisons. B Case-control enrichment of rare protein-coding variants in identified genes across consequence categories. The dot represents the OR; the putatively damaging nature of the variants reduces from dark blue to light blue according to the legend. C The number of predicted functional consequences, represented by color, that are displayed in risk-enhancing (OR > 1) and protective (OR < 1) associations. D Box plots of ORs in the five predicted functional consequences categories. Data are presented as median values, denoted by horizontal lines within boxes that represent the 25th and 75th percentiles, encompassing the interquartile range. Participant count: 350,770, encompassing a diverse cohort. Gene distribution is as follows: 90 genes with pLOF mutations; 45 genes with REVEL scores 75–100; 56 genes with scores 50–75; 60 genes with scores 25–50; and 50 genes with scores 0–25. IMD immune-mediated disease, UKB the United Kingdom Biobank, pLOF predicted loss-of-function, PC principal component, FDR false discovery rate, OR odds ratio, UR ultra-rare, R rare, pmis predicted deleterious missense, REVEL rare exome variant ensemble learner, RF rheumatic fever, AA alopecia areata, ACD allergic contact dermatitis, AD atopic dermatitis, AU allergic urticaria, Bullouse bullouse disorders, Lichen Lichen planus, SLE systemic lupus erythematosus, ADGC allergic and dietetic gastro-enteritis and colitis, Celiac celiac disease, Crohn Crohn’s disease, Hepatitis autoimmune hepatitis, PBC primary biliary cirrhosis, UC ulcerative colitis, T1D diabetes mellitus (Type I), Graves Graves’ disease, HPT autoimmune hypothyroidism, AP allergic purpura, ID immunodeficiency with predominantly antibody defects, ITP idiopathic thrombocytopenic purpura, PA pernicious anemia, SD sarcoidosis, AS ankylosing spondylitis, Behcet Behcet’s disease, NV necrotizing vasculopathies, OA osteoarthritis, PR polymyalgia rheumatica, PST psoriatic and enteropathic arthropathies, Sicca Sicca syndrome (Sjogren’s syndrome); GB Guillain-Barre syndrome, MG myasthenia gravis, MS multiple sclerosis, AR allergic rhinitis.
Fig. 3
Fig. 3. Exome-wide analysis of common genetic variation for 40 IMDs in UKB.
A A circos plot representing significant associations between common genetic variants (MAF > 1%), their reference genes, and linked IMDs. The shape of the points indicated whether the mutations were associated with higher or lower disease risk, while their size conveys the strength of association measured through coefficients. Statistical significance was ascertained using a logistic model in PLINK, with the conventional threshold set at 5 × 108 (before adjustment of multiple comparisons, two-sided P-value). B A scatter plot illustrating the convergence of GWAS signals (P < 5 × 108, two-sided, before adjustment of multiple comparisons) across identified common genes. The x-axis labels the phenotypes (IMDs), while the y-axis presents the coefficient for each association test. C The pleiotropic impacts of detected common protein-coding genes on various IMDs. The shape of the point denotes if the gene-disease link is novel, previously identified, or a replication of past findings. IMD immune-mediated disease, UKB the United Kingdom Biobank, MAF minor allele frequency, GWAS genome-wide association study, AD atopic dermatitis, Lichen Lichen planus, SLE systemic lupus erythematosus, Celiac Celiac disease, Crohn Crohn’s disease, UC ulcerative colitis, T1D diabetes mellitus (Type I), Graves Graves’ disease, HPT autoimmune hypothyroidism, SD sarcoidosis, AS ankylosing spondylitis, NV necrotizing vasculopathies, PR polymyalgia rheumatica, PST psoriatic and enteropathic arthropathies, Sicca Sicca syndrome (Sjogren’s syndrome), MS multiple sclerosis, Behcet Behcet’s disease, Hepatitis autoimmune hepatitis, ID immunodeficiency with predominantly antibody defects.
Fig. 4
Fig. 4. Burden heritability and genetic correlations of IMDs.
A The burden heritability of IMDs calculated by burden heritability regression (“Methods” section). The x-axis indicates the specific IMDs and the y-axis indicates the heritability based on rare variants. The graph showcases the aggregate heritability for each disease and highlights the most impactful category of heritability for each phenotype. B Significant genetic correlations between the 30 IMDs with identified rare variants. Only substantial pairwise correlations (with a correlation coefficient, rg > 0.3) are emphasized, with weaker correlations appearing nearly transparent. The size of each node (represented by circles) indicates the number of significant correlations associated with a particular phenotype. The intensity of the line color between nodes conveys the strength and direction of the correlation coefficient. IMD immune-mediated disease, BHR burden heritability regression, pLOF predicted loss-of-function, MAF minor allele frequency, pmis predicted deleterious missense, RF rheumatic fever, MS multiple sclerosis, MG myasthenia gravis, Sicca Sicca syndrome (Sjogren’s syndrome), NV necrotizing vasculopathies, Behcet Behcet’s disease, AS ankylosing spondylitis, SD sarcoidosis, PA pernicious anemia, ITP idiopathic thrombocytopenic purpura, AP allergic purpura, HPT autoimmune hypothyroidism, Graves Graves’ disease, T1D diabetes mellitus (Type I), PBC primary biliary cirrhosis, Hepatitis autoimmune hepatitis, Crohn Crohn’s disease, Celiac Celiac disease, ADGC allergic and dietetic gastro-enteritis and colitis, SLE systemic lupus erythematosus, Bullouse Bullouse disorders, AD atopic dermatitis, ACD allergic contact dermatitis, AA alopecia areata.
Fig. 5
Fig. 5. Protein expression alterations and corresponding druggability.
A A series of density plots illustrating the differences in protein expression levels between mutation carriers and non-carriers. P-values are two-sided, adjusted by the false discovery rate. The black lines in each plot indicate the median protein expression level for each group. It’s noteworthy that protein expression data is available for only 16 of the identified genes. B MR analysis to discern the causal link between protein expression levels and IMDs. GWAS analyses were first performed on protein expression, followed by the selection of SNPs from GWAS as instrumental variables. The point’s edge color represents the negative logarithm of the FDR P-value, whereas the interior color stands for the coefficient. IVW was selected as the prior method for MR. C Gene plots displaying the protein-coding variants that contribute to the amino-acid signals for four protein entities (CBLB, DHX3, CIITA, and CAPN9). The protein domains and missense-constrained regions of the gene are also labeled. IMD immune-mediated disease, GWAS genome-wide association study, SNP single nucleotide polymorphism, FDR false discovery rate, MR Mendelian randomization, IVW inverse-variance weighted, AD atopic dermatitis, Celiac Celiac disease, PA pernicious anemia, Crohn Crohn’s disease, Graves Graves’ disease, SD sarcoidosis, MG myasthenia gravis, HPT autoimmune hypothyroidism, PBC primary biliary cirrhosis.
Fig. 6
Fig. 6. Biological insights from multi-omics analysis.
All p-values are two-sided, false discovery rate corrected. A Significant associations between rare IMDs genes and biological indicators. The color and shape of the points indicate the class of the associated biological indicators and the class of mutations. B The number of significant associations between common IMDs genes and biological indicators. For MRI, only the regions with the top ten significant associations were displayed. C Significant pathway enrichment of PPI clusters in KEGG, GO, and Reactome. D Uniform Manifold Approximation and Projection of scRNA-seq data of blood, bone marrow, bladder, and kidney for TET2 and LRP1. IMD immune-mediated disease, PC principal component, OR odds ratio, MRI magnetic resonance imaging, scRNA single-cell RNA, KEGG Kyoto Encyclopedia of Genes and Genomes, GO gene ontology, pLOF predicted loss-of-function, UR ultra-rare, R rare, pmis predicted deleterious missense, TP total protein, RF rheumatoid Factor, DBil direct bilirubin, GammaGT gamma glutamyltransferase, SHBG sex hormone-binding globulin, ALP alkaline phosphatase, Cr creatinine, CRP C-reactive protein, Alb albumin, ALT alanine aminotransferase, Ca calcium, TBil total bilirubin, Glu glucose, Testo testosterone, ApoB apolipoprotein B, UA urate, HDLC Hdl cholesterol, HbA1c glycated hemoglobin, CysC cystatin C, LDLC Ldl direct, TC total cholesterol, ApoA apolipoprotein A, AST aspartate aminotransferase, TG triglycerides, WBC c white blood cell count, Lym p lymphocyte percentage, Mon p monocyte percentage, NLR neutrophil lymphocyte ratio, Lym c lymphocyte count, Neu p neutrophil percentage, Mon c monocyte count, PLT platelet, Neu c neutrophil count, NLR neutrophil count/lymphocyte count, FEV1 forced expiratory volume in 1-second (Fev1), FVC Z forced vital capacity (Fvc) Z-score, FEV1_FVC_ratio_Z Fev1/ Fvc ratio Z-score, ERS_ATS reproduciblity of spirometry measurement using ERS/ATS criteria, Z_QC spirometry QC measure, FEV1_pdpc forced expiratory volume in 1-second (Fev1), predicted percentage, FVC forced vital capacity (Fvc), FVC_B forced vital capacity (Fvc), best measure, FEV1_B forced expiratory volume in 1-second (Fev1), best measure, FEV1_pr forced expiratory volume in 1-second (Fev1), predicted, Pairs_MT pairs matching, P-cingulate posterior cingulate, IM-CG isthmuscingulate, S-Temporal superiortemporal, PPCL parsopercularis, I-Temporal inferiortemporal, S-Marginal supramarginal, I-Parietal inferiorparietal, P-HPC Parahippocampal, P-Calcarian Pericalcarine.

Similar articles

Cited by

References

    1. Collins PY, et al. Grand challenges in global mental health. Nature. 2011;475:27–30. doi: 10.1038/475027a. - DOI - PMC - PubMed
    1. Wigerblad G, Kaplan MJ. Neutrophil extracellular traps in systemic autoimmune and autoinflammatory diseases. Nat. Rev. Immunol. 2023;23:274–288. doi: 10.1038/s41577-022-00787-0. - DOI - PMC - PubMed
    1. Punga AR, Maddison P, Heckmann JM, Guptill JT, Evoli A. Epidemiology, diagnostics, and biomarkers of autoimmune neuromuscular junction disorders. Lancet Neurol. 2022;21:176–188. doi: 10.1016/S1474-4422(21)00297-0. - DOI - PubMed
    1. DiMeglio LA, Evans-Molina C, Oram RA. Type 1 diabetes. Lancet. 2018;391:2449–2462. doi: 10.1016/S0140-6736(18)31320-5. - DOI - PMC - PubMed
    1. Seror R, Nocturne G, Mariette X. Current and future therapies for primary Sjogren syndrome. Nat. Rev. Rheumatol. 2021;17:475–486. doi: 10.1038/s41584-021-00634-x. - DOI - PubMed

LinkOut - more resources