The contribution of genetic determinants of blood gene expression and splicing to molecular phenotypes and health outcomes

The contribution of genetic determinants of blood gene expression and splicing to molecular phenotypes and health outcomes The contribution of genetic determinants of blood gene expression and splicing to molecular phenotypes and health outcomes


Genetic regulation of local gene expression and splicing

We performed bulk RNA-seq on peripheral blood collected from 4,732 blood donors recruited as part of the INTERVAL study (Methods). The expression levels of 19,173 autosomal genes and 111,937 de novo transcript splicing phenotypes (herein referred to as ‘splicing events’) from differential intron usage ratios in 11,016 genes were quantified. Then, we mapped local (cis) expression QTLs (eQTLs) within ±1 Mb of the transcription start site (TSS) and splicing QTLs (sQTLs) within ±500 kilobase pairs (kb) of the center of the spliced region.

We identified 17,233 genes (89.9% of the 19,166 tested) with at least one significant cis-eQTL (cis-eGene) at a false-discovery rate (FDR) < 0.05 (Supplementary Tables 1 and 2; Methods). Stepwise conditional analyses for each cis-eQTL revealed 56,959 independent signals (53,457 unique lead variants), with a median of three independent signals per gene (range = 1–23; Supplementary Tables 1 and 3; Methods). We compared our results to those from the eQTLGen consortium study given in ref. 9 (n = 31,684 individuals). z scores from eQTL lead SNPs were highly correlated between these studies (Pearson’s r = 0.9; Supplementary Fig. 1 and Supplementary Tables 2 and 4), highlighting the consistency of eQTL discovery results across independent datasets and mapping technologies.

Next, we identified 29,514 splicing events with a cis-sQTL at FDR < 0.05 (Supplementary Tables 1 and 5). These splicing events with a cis-sQTL were mapped to 6,853 genes (cis-sGenes) with a median of three splicing events observed per cis-sGene (range = 1–128). This included 543 cis-sGenes that were not identified as cis-eGenes. Across all splicing events with cis-sQTLs, these had a median length of 1,549 base pairs (bp) and excised a protein-coding sequence in 32.4% of cases (the remainder related to intronic and UTR excisions). The median distance from the cis-sQTL lead variants to the center of the splicing event was 187 bp upstream, with lead variants forming a bimodal distribution around the start and end of the sGene (Fig. 2a).

Fig. 2: Genetic influences on gene expression and splicing.
figure 2
a, Distribution of lead variants at cis-eQTLs and cis-sQTLs around the TSS and gene body (normalized to the median gene length of 24 kb). b, Schematic representation of the trans-QTL mapping analysis approach and summary of the QTL discovery results. c, Circos plot of the trans-splicing of 18 sGenes by the cis-eQTL for QKI. TES, transcription end site.

After conditional analysis for each cis-sQTL, we identified 47,050 independent signals (34,205 unique lead variants), with a median of one independent signal per cis-sQTL (range = 1–20; Supplementary Tables 1 and 6). To characterize independent variant effects on transcript splicing, we compared primary (that is, the most significant independent QTL) and secondary (that is, all other independent QTLs) cis-sQTLs. Primary cis-sQTL signals were enriched within the gene body of sGenes compared to secondary signals (P = 2.84 × 10−314, chi-squared test; Fig. 2a and Supplementary Fig. 2). Primary cis-sQTL signals were more enriched toward the transcription end site (median of 17.36 kb downstream of the TSS) compared to cis-eQTLs with a median of 5.51 kb downstream of the TSS (P = 8.42 × 10−259, two-sided Wilcoxon test; Supplementary Fig. 2). These observations align with previous analyses for isoform ratio QTLs21. Next, we compared the identified sGenes with those from the Genotype-Tissue Expression (GTEx) Consortium22 whole-blood dataset (n = 670 individuals). Of the 3,013 sGenes discovered by GTEx, 89.0% of the 2,677 we tested were also found as sGenes in our analysis, in addition to 4,470 new sGenes (Supplementary Table 7). These results demonstrate the value of quantifying de novo splicing excision events and the substantially larger sample size.

For a given gene, to test whether corresponding cis-eQTLs and cis-sQTLs were underpinned by the same genetic variant, we performed colocalization analyses. This revealed 3,979 genes (of 6,252 tested) with colocalized signals (Methods). We found that 49.0% (n = 13,490) of tested splicing events had sQTLs that colocalized with an eQTL for the same gene (Supplementary Table 8). However, of the eQTL-colocalizing splicing events with multiple independent sQTL signals, 82% had additional sQTL loci that did not colocalize with eQTLs. Splicing events with sQTL that did not colocalize with an eQTL were located further downstream of the TSS (median 20.33 kb downstream) compared to sQTL signals that did colocalize (median 12.61 kb downstream; P = 9.8 × 10−70, two-sided Wilcoxon test; Supplementary Fig. 3).

Genetic effects on distal gene expression and splicing

Next, we investigated the distal (trans) regulatory effect of genetic variants (>5 Mb from the TSS/splicing event). First, we performed an untargeted, all-versus-all trans-eQTL and trans-sQTL analysis. We found a high correlation of trans-eQTL z scores for SNP–gene pairs also tested by the eQTLGen consortium study9 (Pearson’s r = 0.9; Supplementary Fig. 4). As our study has lower statistical power than eQTLGen, we focused on replicating the 2,924 most significant trans-eQTL associations from eQTLGen with P < 1 × 10−20. Of these, we replicated 63% at P < 1 × 10−6 in INTERVAL. We note that the incomplete overlap may be due to differences in data analysis strategies, including accounting for blood cell counts in the association model9.

Given the extreme multiple testing burden for genome-wide trans-QTL analyses, we focused on the 53,457 conditionally independent lead cis-expression SNPs (eSNPs), as these provide a potential mechanism through which a cis-acting variant can also affect genes in trans. In this targeted analysis, we identified 2,058 trans-eGenes at the Bonferroni-corrected threshold of P < 5 × 10−11 (Fig. 2b and Supplementary Tables 1, 2 and 9). These trans-eQTLs corresponded to 2,498 cis-eQTLs, with a median of three trans-eGenes per cis-eQTL (range = 1–284). Some of the cis-eGenes were associated with many trans-eGenes, such as PLAG1 (n = 284 genes), HYMAI (n = 284) and FUCA2 (n = 267). Cis-eGenes with a concurrent trans-association were significantly enriched for 32 gene ontology (GO) terms, compared to all cis-eGenes. Most of the terms related to transcription regulation and immune response, with ‘metal ion binding’ showing the strongest enrichment (P = 2.6 × 10−30; Supplementary Table 10). To further explore these transcriptional regulation mechanisms, we annotated the genes using the Human Transcription Factors database23. We found a significant enrichment in sequence-specific transcription factors, representing 14.3% of all cis-eGenes with a trans-association (357/2,498, P = 1.83 × 10−38; Methods). We investigated protein domain annotations for the observed transcription factors and detected a significant enrichment for the C2H2 zinc finger domain (P = 9.74 × 10−9 after Bonferroni multiple-testing correction), specifically with the Krüppel-associated box domain (P = 3.04 × 10−10; Supplementary Fig. 5). For example, the PLAG1 gene, which is an important regulator of the human hematopoietic stem cell dormancy and self-renewal24, codes for a protein with a C2H2 zinc finger domain. We also noted the same enrichment for trans-eGenes (P = 5.73 × 10−5), although the molecular mechanisms are unclear and need further investigation.

To uncover genetic expression effects impacting distal downstream transcript splicing, we performed a targeted trans-analysis using the same 53,457 conditionally independent lead cis-eSNPs as in the trans-eQTL analysis. The analysis identified significant trans-associations for 644 splicing events (209 trans-sGenes) at the Bonferroni-corrected threshold of P < 8.36 × 10−12. This comprised 758 unique trans-splicing SNPs (sSNPs), corresponding to 566 cis-eGenes (Fig. 2b and Supplementary Tables 1 and 11). Of the 644 splicing events regulated in trans, 240 (in 91 genes) were not observed to be regulated in cis, increasing the total number of splicing events with QTLs. We observed 11 cis-eGenes that were implicated by their cis-eQTLs in the regulation of more than ten sGenes in trans. For example, we observed that the cis-eQTL for the RNA-binding splice factor QKI was associated with 18 sGenes in trans (the most of any eGene; Fig. 2c). Across all tissues in GTEx, there were only 29 trans-sQTL associations, of which only two were present in whole blood, that is, the trans-splicing of FYB1 via the QKI cis-eQTL and the trans-splicing of ABHD3 for which they did not detect an associated cis-effect for the trans-sSNP22. Here we replicated both of these previous trans-sGene observations. For ABHD3, we demonstrate in addition that this trans-sSNP is also a cis-eSNP for the splicing factor TFIP11 and its antisense long noncoding RNA TFIP11-DT, potentially regulating the splicing of this gene in trans. Reports of the preliminary overlap of trans-sQTL associations with previous experimental validation are given in Supplementary Note and Supplementary Table 12. Across the whole dataset, cis-eGenes of trans-sSNPs were significantly enriched for ten GO terms, including ‘nucleosome assembly’ (P = 2.78 × 10−6) and ‘RNA polymerase II activity’ (P = 1.40 × 10−5; Supplementary Table 10).

Shared genetic etiology across molecular traits

We next compared transcriptional QTLs to the other omics QTLs derived from subsets of participants from the INTERVAL study. These data include plasma protein QTLs (pQTLs) from the Olink Target and SomaScan panels, as well as metabolite QTLs (mQTLs) from the Metabolon and Nightingale Health platforms.

To determine whether genetic signals at a given locus across omics layers reflect shared genetic or distinct causal variants, we performed statistical colocalization analyses (Methods). These analyses revealed colocalization between either a cis-eQTL or cis-sQTL and cis-QTL for 120 Olink-measured proteins (65.9% of analyzed proteins), 404 SomaScan-measured proteins (63.7%), 224 Nightingale-measured metabolites (99.1%) and 495 Metabolon-measured metabolites (81.5%; Fig. 3a and Supplementary Tables 13–16). We found colocalized signals across all assessed proteomic and metabolomic traits for 1,229 cis-eGenes and 649 cis-sGenes (1,516 unique genes). For Olink- and SomaScan-measured proteins, genetic effect directions were more consistent (P = 5.4 × 10−10, one-sided Fisher’s exact test) for colocalizing eQTL–pQTL pairs (78.9% with consistent effect directions) than noncolocalizing pairs (59.0%). The uncoupling of eQTLs and pQTLs has previously been observed25 and could be due, for example, to post-transcriptional or post-translational mechanisms.

Fig. 3: Colocalization analyses of cis-eQTL and cis-sQTL with other molecular phenotypes.
figure 3

a, Barplot of the percentage of omics traits with a colocalized association signal with a cis-eQTL or/and a cis-sQTL. b, Network graph of all pairwise colocalization results. Highlighted examples on the right-hand side include OAS1, IL6R and WARS1.

Of the 99 eQTL–pQTL pairs (364 sQTL–pQTL pairs) analyzed for colocalization in both Olink and SomaScan platforms, we found that 45 (127) had a colocalized signal in both platforms, 19 (57) on the Olink platform only and 9 (41) on the SomaScan platform only (Supplementary Tables 17 and 18). We annotated these colocalization results with cross-assay correlations reported previously26 and found significantly higher cross-assay correlations for eQTL/sQTL–pQTL pairs with a colocalized signal for both platforms compared to eQTL–pQTL pairs with a colocalized signal for only one platform (eQTL–pQTL pairs, P = 3.1 × 10−4; sQTL–pQTL pairs, P = 1.4 × 10−7; one-sided Wilcoxon rank-sum test). This indicates that the differences we observed in colocalization results might be due to differences in protein measurements between the two platforms.

Next, we created a network to explore and visualize the interconnectedness among colocalized transcriptional and molecular phenotypes (Fig. 3b), linking each phenotype by their colocalizations. For example, we found seven splicing events in the OAS1 gene with cis-sQTLs that colocalized with both the cis-eQTLs for this gene and the OAS1 pQTLs.

To investigate the potential mechanisms by which genetic variants impact protein levels through splicing, we annotated the protein domains affected by splicing events. We observed that nearly half of splicing events that colocalized with pQTLs (41.0%, 401 of 977) excised annotated protein-coding sequences. Splicing has been shown to modulate circulating protein levels through changes in secretion by the inclusion or exclusion of transmembrane domains27. This is exemplified by a splicing event that removes exon 6 of the FAS gene, a cell surface receptor for the FAS-ligand (FASL) cytokine. The resulting protein, lacking a transmembrane domain, is secreted28 and competitively inhibits FASL binding, leading to decreased apoptosis. We identified both cis-eQTLs for FAS and cis-sQTLs for this splicing event, but these signals were distinct and did not colocalize (maximum posterior probability = 0.02). The cis-sQTLs for excision of the transmembrane domain strongly colocalized with the pQTL (posterior probability = 1.00). Similarly, the interleukin-6 and interleukin-7 receptors (IL-6R and IL-7R, respectively) have previously been reported to produce secreted isoforms through the excision of transmembrane domains29,30. Here we show that the pQTLs for IL-6R and IL-7R colocalized with cis-sQTLs excising these transmembrane domain-encoding exons, in the absence of cis-eQTL colocalization (Fig. 3b). This observation emphasizes the role of transcript splicing as a mechanism independent of total transcript abundance through which genetic variation can modify downstream molecular phenotypes. Furthermore, we observed a pQTL colocalizing with an sQTL for the excision of a transmembrane domain in the encoding messenger RNA (mRNA) in 69 proteins (98 unique splicing events), with 60.2% of these independent sQTL signals (n = 100/166) not colocalizing with eQTLs for the same gene (Supplementary Table 19). For example, this is observed in α-1 antitrypsin encoded by SERPINA1 and apolipoprotein L1 encoded by APOL1. Of these 69 transmembrane proteins, the majority were annotated as being single-pass, with only four (ENTPD1, ADGRE2, ADGRE5 and ADGRE1) being multipass transmembrane proteins.

To maximize statistical power for colocalization, we extended our analyses to the SomaScan-pQTL and Olink-pQTL datasets from deCODE8 (n = 35,559 individuals and n = 4,719 proteins) and the UK Biobank Pharma Proteomics Project12 (UKB-PPP; n = 54,219 individuals and n = 2,941 proteins), respectively. Colocalization analyses were performed between 1,608 Olink- and 1,410 SomaScan-measured proteins and our transcriptional phenotypes, increasing the discovery of pQTL–eQTL/sQTL colocalizations from 120 to 1,203 Olink-measured proteins and from 404 to 984 SomaScan-measured proteins. We observed a substantial overlap of eGenes and splicing events with QTLs colocalizing between our internal and the larger external pQTL cohorts. In UKB-PPP, we replicated 95.1% and 79.3% of eQTLs and sQTLs colocalizations, respectively, and in deCODE, 87.0% and 80.3% of eQTLs and sQTLs, respectively (Supplementary Tables 13–15; web portal).

Mapping causal transcriptional events on molecular phenotypes

To assess the causality of the transcriptional phenotypes on downstream molecular phenotypes, we performed mediation analyses focusing on colocalizing molecular traits assayed in the INTERVAL study (Fig. 4a; Methods). The expression of 143 cis-eGenes significantly mediated the effect of 413 cis-eSNPs on 202 downstream molecular phenotypes, including 101 SomaScan-measured proteins, 54 Olink-measured proteins, 39 Nightingale-measured metabolites and 8 Metabolon-measured metabolites. In total, this comprised 525 significant eQTL mediation models (variant-gene-molecular phenotype triplets; Fig. 4b). Similarly, we observed 106 splicing event phenotypes in 47 sGenes that significantly mediated the effect of 152 cis-sSNPs on 50 downstream molecular phenotypes, including 32 SomaScan-measured proteins, 16 Olink-measured proteins, 1 Nightingale-measured metabolite and 1 Metabolon-measured metabolite, comprising 241 significant sQTL mediation models (Supplementary Tables 20 and 21).

Fig. 4: Mediation analyses of molecular phenotypes with transcriptional QTLs.
figure 4

a, Schematic representation of the tested mediation model, for which eQTL and sQTL phenotypes mediate the relationship between genomic variants and levels of molecular phenotypes. The images depicting ‘independent genomic variants’ (the figure is created with NIAID NIH Bioart) and ‘molecular phenotypes’ (PDB code 2F6W) were reproduced from public databases. b, Total number of detected molecular phenotypes mediated by sQTLs and eQTLs. c, Colocalization of sQTLs excising the transmembrane domains of the interleukin receptors IL6R and IL17RA and mediation with plasma protein quantities (n = 3,024 for IL17RA and n = 3,072 for IL6R). The central point represents the mediation effect estimate. Error bars represent the upper and lower 95% confidence intervals of the estimated effects. d, Schematic representation of the splicing events excising transmembrane domains of the interleukin receptors IL6R and IL17RA.

Previous reports showed that the missense variant rs2228145 affects IL-6R ectodomain shedding by the alteration of one of the cleavage sites of ADAM10/ADAM17 metalloproteinases31,32. In line with this finding, we observed the previously mentioned IL6R transmembrane splicing event mediated a minority of the effect of the lead SNP (rs12126142), which is in high LD (r2 > 0.99; D′ > 0.99) with this missense variant, on Olink-measured plasma protein abundance (4.67%, P = 1.12 × 10−4; Fig. 4c,d). This suggests a potential dual action of the sSNP or tagged variants on removing this domain and, hence, creating a soluble isoform by both splicing and proteolytic pathways. Conversely, the colocalized signal (lead cis-sSNP rs34495746) between splicing of the transmembrane domain of IL17RA and levels of its plasma protein was found to have most of the effect mediated by transcript splicing (90.41%, P = 1.14 × 10−43; Fig. 4c). Consistent with this observation, neither the lead SNP nor any strong tagging SNPs (r2 > 0.8) were missense variants.

Deconvoluting molecular mechanisms underlying GWAS loci

Molecular QTLs can provide insights into the mechanisms underlying genetic variants that influence disease risk33. We performed colocalization analyses with genetic association signals for 20 disease phenotypes from the FinnGen project (release 9)34, prioritized based on their relevance to the circulatory system and available sample size (that is, ≥1,000 cases; Supplementary Table 22).

Disease-associated signals colocalized with 649 cis-eGenes and 365 cis-sGenes (1,035 splicing events) across all tested traits (Supplementary Tables 23 and 24). Many of these independent signals (136/981 for cis-eQTLs and 304/1589 for cis-sQTLs) also colocalized with pQTLs and mQTLs, revealing the regulatory pathways underlying the complex trait-associated variants. For example, a cis-sQTL for the transmembrane domain splicing of IL7R colocalized with an association locus for dermatitis and eczema, as well as a pQTL for IL-7R in UKB-PPP (Fig. 5a). This analysis implicates soluble isoforms of IL-7R generated by alternative splicing in this condition. The alternative allele of rs6897932 (T) is associated with decreased excision of the IL7R transmembrane domain, lower levels of IL-7R in plasma and reduced risk of dermatitis and eczema. This allele has been previously shown to associate with decreased lymphocyte count35 and decreased risk of multiple sclerosis36, suggesting consistent therapeutic implications.

Fig. 5: Multitrait colocalization of cis-eQTLs and cis-sQTLs with molecular phenotypes and health outcomes.
figure 5

a, Putative pathways and directions of the effect of sQTL signals for IL7R and WARS1 associated with plasma protein quantity, dermatitis and eczema, and hypertension, respectively. The image depicting ‘soluble protein’ was reproduced from a public database (PDB code 2F6W). b, Gene-level summary of colocalization of cis-eQTL and cis-sQTL with COVID-19 HGI summary statistics. The red dashed line represents genome-wide significance (P = 5 × 10−8) and the height toward the center represents the significance of the GWAS association. c, Example of a multitrait colocalization for COVID-19 in OAS1, with GWAS summary statistics, cis-pQTL, cis-eQTL and cis-sQTL.

Tryptophanyl-tRNA synthetase 1 (encoded by WARS1) exists in both secreted and intracellular forms37, with downstream impacts on vascular permeability38. Here we found a cis-sQTL for excision of exon 10 of WARS1 (encoding a portion of the tRNA synthetase protein domain), which colocalized with both the WARS1 pQTLs and risk for hypertension in FinnGen (Fig. 5a). The alternative allele of rs724391 (C) is associated with decreased excision of exon 10, higher plasma protein levels of WARS1 and increased risk of hypertension.

Finally, we also performed a genetic look-up analysis of all independent signals at the identified cis-eQTLs and cis-sQTLs using data from the Open Target Genetics Portal (v22.10). We present the results on our web portal.

Transcriptional mechanisms underlying COVID-19 GWAS loci

Most of the whole-blood RNA is derived from circulating immune cells. Given the importance of the host immune response in COVID-19, we conducted colocalization analyses of the identified eQTLs and sQTLs with genetic loci associated with COVID-19 susceptibility and severity available from the pan-biobank COVID-19 Host Genetics Initiative39. We found colocalized signals with COVID-19 loci for 67 cis-eGenes and 42 cis-sGenes (91 splicing events; Supplementary Tables 25 and 26 and Fig. 5b), of which 17 overlapped.

Previous analyses have identified genetic variants that impact splicing of OAS1 (refs. 40,41). These variants have subsequently been implicated in influencing COVID-19 severity41. Consistent with these data, we observed colocalization of an eQTL and sQTLs for seven splicing events at the OAS1 locus with COVID-19 (Fig. 5c). Adjusting for OAS1 gene expression levels did not ablate the sQTL signals (P < 1 × 10−16), suggesting the presence of multiple independent transcriptional mechanisms at this locus. In addition, we found colocalization for these eQTLs and sQTLs with the OAS1 pQTL, suggesting that genetic variants mediate disease risk through transcriptional changes impacting soluble protein levels.

Furthermore, the GWAS signals for COVID-19 susceptibility and severity at the IFNAR2 locus (encoding the interferon α/β receptor 2) colocalized with a cis-eQTL, and cis-sQTLs associated with 10 splicing events in this gene (Supplementary Fig. 6). This included a splicing event excising exons 8 and 9, encoding the IFNAR2 transmembrane domain. Rare (stop–gain) mutations in exon 9 of this gene leading to loss of function have been previously reported to increase the risk of severe COVID-19 infection42. While IFNAR2 was not measured by the proteomic assays, isoforms of IFNAR2 lacking the transmembrane domain are known to generate a soluble protein isoform43, and significantly higher quantities of soluble IFNAR2 have been observed in the serum of patients with severe COVID-19 (ref. 44). However, the role of splicing in this gene on disease severity has not been previously reported. Notably, the colocalizing IFNAR2 eQTLs are also trans-sQTLs for five splicing events in IFI27, four of which do not have an association in cis. Our results provide evidence for a mechanism whereby common variants regulating splicing of IFNAR2 could be contributing to disease severity through impacts on protein solubility.




Source link

Add a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use