Behind the Paper

One gene, Different Story: How Isoform Level Expression Uncovers Hidden Cancer Risk Signals

Genome-wide association studies (GWAS) have uncovered hundreds of genetic variants associated with the risk of common cancers. However, most of these associations are in non-coding regions, where the biological mechanisms are poorly understood. Transcriptome-wide association studies (TWAS) help bridge this gap by examining how genetic variants regulate gene expression. Yet, traditional TWAS mainly focus only on total gene expression, potentially overlooking the complexity of alternative splicing, which can produce multiple RNA and protein isoforms from the same gene, each may contribute differently to cancer risk.

Our study begins with a main question: Are we missing critical cancer-risk signals by focusing solely on total gene expression in TWAS analyses? 

To address it, we applied an isoform-level TWAS (isoTWAS) framework that integrates genetic and isoform-level transcriptomic variation with GWAS summary statistics. Using isoTWAS, we investigated associations between isoform expression and 12 cancer outcomes, including breast, endometrial, colorectal, lung, ovarian, and prostate cancers and their subtypes. We then compared isoTWAS with traditional gene-level TWAS to evaluate the benefits of incorporating isoform-level resolution.

What Did We Find?

Overall, isoTWAS identified more informative signal than gene-level TWAS. Across the 12 cancer outcomes, isoTWAS uncovered about a 164 % increase in the number of risk‑linked genes compared with gene‑level TWAS, revealing associations that gene-level TWAS missed. In terms of effective sample size, isoTWAS shows also an average 25.3–37.4% increase in effective sample size compared to TWAS, allowing detection of comparable power with smaller sample size. Additionally, these discoveries are not only statistically significant but also biologically meaningful. 19.9 % of isoTWAS‑prioritized genes lie in regions of high selective constraint, suggesting evolutionary importance. isoTWAS also uncovers pan‑cancer risk signals that gene-level TWAS misses. It identifies 34 genes associated with five or more cancer types signals absent from gene‑level TWAS and these genes are highly enriched as downstream targets of key oncogenic transcription factors.

In addition to increasing the discovery of susceptibility genes, isoform expression tags more GWAS loci and explain more overall SNP heritability. Overall, isoTWAS captures 52.4% more GWAS loci than TWAS and uncovers an additional 2,911 significant genes outside known loci. In addition, we estimated the proportion of total SNP heritability mediated by gene- and isoform-level expression and found isoform-level expression explains 62.7% more of cancer risk SNP heritability. 

We further revealed on nine loci identified by isoTWAS that showed isoform-eQTL colocalization with GWAS signals but no gene-level eQTL colocalization. CLPTM1LLAMC1, and BABAM1 are highlighted in the study due to their previously reported pleiotropic associations across multiple cancer types. CLPTM1L lies near the TERT locus, a well-known region linked to pan-cancer risk; LAMC1 has been associated with endometrial cancer, glioma, and prostate cancer; and BABAM1 is an established GWAS locus with broad relevance to cancer risk.

Case Study: BABAM1 and Breast Cancer

We focus here on BABAM1 due to its unique exon structure and interaction with the famous breast cancer-related gene BRCA1. Nine of its isoforms were associated with breast cancer in our isoTWAS and fine-mapping prioritized the transcript ENST00000599474.5. Multiple genome-wide significant SNPs associated with breast cancer lie within the BABAM1 gene body, yet none showed a gene-level eQTL signal. To better understand the regulatory mechanisms at this locus, we integrated functional annotations from ENCODE and the Roadmap Epigenomics Project, splice-site QTL (sQTL) summary statistics from GTEx, and splicing effect predictions from SpliceAI. We observed splicing events associated with the SNPs in perfect LD with the lead isoQTL, suggesting a regulatory role in splicing. Additionally, a strong CCCTC-binding factor (CTCF) peak at the terminal exon supports the hypothesis of transcriptional termination and isoform-specific expression regulation in this region. While further research is needed, our findings highlight an important opportunity that integrating sQTL and isoQTLs may enhance the discovery of transcriptomic mechanisms underlying cancer risk.

So back to the question: Are we overlooking critical cancer-risk signals by focusing solely on total gene expression in TWAS analyses? 

The answer is yes. By examining only total gene expression, we may ignore important effects driven by specific isoforms. Our findings suggest that isoform-level TWAS can uncover regulatory mechanisms and prioritize risk genes that would otherwise remain undetected in gene-level TWAS.

Limitations and Future Directions

There are three limitations of our work. First, fine-mapping isoform association is challenging due to horizontal pleiotropy. It can reduce power and increase false-positive rates. Second, isoform expression was estimated using short-read RNA-seq data and Salmon, which depends on transcript annotations and may introduce uncertainty. Long-read RNA-seq can offer more accurate isoform resolution. Third, our analyses focused on European ancestry cohorts due to sample size constraints. Future work should prioritize more ancestrally diverse datasets and develop models for multi-ancestry.