Behind the Paper

Unraveling the Complexity of Gene-Environment Interactions in Noncommunicable Diseases through Multi-Omics Research

Multi-omics research is revolutionizing how we study gene-environment interactions in chronic diseases. Our review explores how recent advances in multiomics and AI/ML approaches enhance disease prediction, biomarker discovery, and precision medicine.

By Dr. Robel Alemu (Lead Author), Associate Professor Azmeraw Amare (Senior Author), Associate Professor Tesfaye Mersha (Senior Author) and Collaborators

Understanding the Genetic and Environmental Complexity of Chronic Diseases

Noncommunicable diseases (NCDs), such as cardiovascular diseases, diabetes, cancers, and chronic respiratory conditions, are the leading causes of mortality worldwide, accounting for over 74% of global deaths1. Unlike infectious diseases, which are caused by specific pathogens, NCDs arise from a complex interplay between genetic predisposition and environmental exposures—a relationship broadly referred to as gene-environment (GxE) interactions2. While genetic factors play a crucial role in shaping an individual’s risk for these diseases, environmental influences such as diet, pollution, stress, and lifestyle choices can modify the way genes function, influencing disease onset and progression.

This intricate biological interplay means that individuals carrying the same genetic variant may experience different disease outcomes depending on their environment. For instance, certain genetic variants increase the risk of Parkinson’s disease in individuals exposed to pesticides3, while the impact of the FTO gene on obesity risk varies based on lifestyle factors like diet, physical activity, and sleep patterns4. These examples illustrate why a one-size-fits-all approach to medicine is inadequate and why studying both genetic and environmental factors together is essential for advancing precision medicine.

How Multi-Omics is Transforming Our Understanding of Disease Mechanisms

Genomic research, particularly genome-wide association studies (GWAS), has identified thousands of genetic variants linked to NCDs5. However, these studies alone have failed to explain the full heritability of many diseases2,6. This gap has fueled the rise of multi-omics research, which integrates various molecular layers—genomics, epigenomics, transcriptomics, proteomics, and metabolomics—to provide a more comprehensive understanding of biological systems.

By integrating multi-omics data with exposome data—which includes lifestyle, environmental, and social determinants of health—researchers can map how environmental influences shape biological pathways that drive disease7. These approaches also enhance GxE interaction analyses by increasing statistical power, mitigating the multiple testing burden, and prioritizing biologically relevant signals. A key advantage of multi-omics integration is its ability to prioritize functionally relevant genetic variants, reducing the need to test all possible SNP-environment interactions. Functional annotations from resources such as ENCODE and GTEx allow researchers to focus on regulatory variants that are more likely to mediate GxE interactions8. Additionally, integrating gene expression and epigenomic data enables the detection of environmentally responsive loci, which may remain undetected in traditional GWAS due to power limitations9.

Expanding Global Representation in Omics Research

One of the most pressing challenges in genetic and multi-omics research is the underrepresentation of non-European populations. More than 85% of GWAS participants are of European ancestry10, which limits the ability of polygenic risk scores (PGS) to accurately predict disease risk for individuals of non-European descent11. This lack of diversity not only exacerbates health disparities but also restricts the discovery of important genetic variants that may be more relevant in certain populations.

However, this diversity gap is not limited to genomic data—the disparities are even more pronounced in other omic layers, such as epigenomics, proteomics, and metabolomics12. The lack of multi-ancestry datasets for these molecular layers makes it challenging to develop equitable, ancestry-inclusive precision medicine approaches. Expanding global genomic and multi-omics diversity is crucial. Studies focusing on African ancestry populations have already led to groundbreaking discoveries, such as the identification of APOL1 variants linked to kidney disease13 and PCSK9variants that contributed to cholesterol-lowering drug development14. By increasing representation in multi-omics datasets, we can improve the accuracy of genetic predictions across diverse populations, ensuring that advances in precision medicine benefit everyone.

AI and Machine Learning in Multi-Omics Research: Addressing GxE Challenges

Gene-environment (GxE) interaction studies face persistent challenges, particularly limited sample sizes and the burden of multiple testing when analyzing high-dimensional biological data15. Traditional approaches often struggle with statistical power constraints, making it difficult to detect meaningful interactions amidst substantial noise and complex dependencies. Multi-omics integration, combined with AI and machine learning (ML), offers new strategies to tackle these limitations by improving feature selection, enhancing interpretability, and refining statistical models.

AI and ML are transforming multi-omics research by enabling the integration of large, complex datasets and identifying hidden patterns that traditional methods often miss. Various computational techniques—including unsupervised learning methods like Principal Component Analysis (PCA) and t-SNE, as well as supervised approaches such as Support Vector Machines (SVMs), Random Forests, and deep learning—help uncover key biological insights by linking genetic and environmental factors to disease phenotypes16. By leveraging these approaches, researchers can prioritize functionally relevant signals, reducing the multiple-testing burden and mitigating the statistical power limitations of traditional GxE interaction analyses.

For example, deep learning models such as those developed by Wu et al. (2023) simultaneously estimate main effects and GxE interactions, overcoming hierarchical constraints in conventional regression-based models9. Similarly, Bayesian ML approaches like mixed-effects Bayesian additive regression trees (mixedBART) allow for higher-order GxE interaction detection, even in datasets with small sample sizes, by modeling relationships without requiring strict parametric assumptions17. These methods enhance signal detection, providing greater statistical power for GxE analysis compared to traditional approaches.

However, AI-driven approaches are not without challenges. Bias in training datasets remains a major concern, as underrepresentation of certain populations can lead to skewed model performance and exacerbate disparities in GxE research. Additionally, the “black box” nature of deep learning models makes it difficult to interpret findings, raising concerns about clinical transparency and trust. Ethical considerations, such as data privacy and responsible AI implementation, must also be addressed to ensure equitable, reliable, and scalable applications of AI in multi-omics research.

Real-World Applications of Multi-Omics Research in Medicine

Multi-omics approaches are already shaping clinical decision-making and treatment strategies in various ways. For example, recent studies have identified genes that protect neurons from oxidative stress, a major contributor to neurodegenerative diseases like Alzheimer’s and Parkinson’s. In pharmacogenomics, multi-omics research has enabled personalized drug dosing, such as tailoring warfarin prescriptions based on CYP2C9 and VKORC1 genetic variants18. Similarly, genetic testing for BRCA1 and BRCA2 mutations is now routinely used in cancer care to identify patients who may benefit from targeted therapies like PARP inhibitors19.

These advances underscore the potential of multi-omics research to revolutionize medicine, moving us closer to individualized prevention and treatment strategies that consider both genetic makeup and environmental influences.

A Global Call to Action

To fully realize the potential of multi-omics research, scientists, policymakers, and funding agencies must prioritize global inclusivity and data-sharing initiatives. Expanding research in underrepresented populations, strengthening research infrastructure in low- and middle-income countries, improving analytical techniques for dissecting GxE interactions, and establishing global standards for data sharing and integration will be key to accelerating scientific discoveries and improving health outcomes worldwide.

As lead author Dr. Robel Alemu emphasizes, “Our review highlights the transformative power of multi-omics research in revealing the biological mechanisms behind chronic diseases. However, this potential will remain unrealized unless we address the significant equity gaps in omics research and ensure that these advancements benefit all populations.”

Join the conversation! How do you see multi-omics shaping the future of precision medicine? Let us know your thoughts in the comments! 🚀

Robel Alemu (Ph.D.)  

Postdoctoral Researcher, University of California Los Angeles

Broad Institute of MIT and Harvard; The University of Adelaide Medical School

Email: robel.alemu@anderson.ucla.edu; ralemu@broadinstitute.org  

References

    1. World Health Organization. Global Action Plan for the Prevention and Control of Noncommunicable Diseases, 2013-2020.
    2. Calcaterra, V. & Zuccotti, G. Non-Communicable Diseases and Rare Diseases: A Current and Future Public Health Challenge within Pediatrics. Children vol. 9 Preprint at https://doi.org/10.3390/children9101491 (2022).
    3. Ngo, K. J. et al. Lysosomal genes contribute to Parkinson’s disease near agriculture with high intensity pesticide use. NPJ Parkinsons Dis 10, (2024).
    4. Young, A. I., Wauthier, F. & Donnelly, P. Multiple novel gene-by-environment interactions modify the effect of FTO variants on body mass index. Nat Commun 7, (2016).
    5. Abdellaoui, A., Yengo, L., Verweij, K. J. H. & Visscher, P. M. 15 years of GWAS discovery: Realizing the promise. American Journal of Human Genetics vol. 110 179–194 Preprint at https://doi.org/10.1016/j.ajhg.2022.12.011 (2023).
    6. Sadee, W. et al. Missing heritability of common diseases and treatments outside the protein-coding exome. Human Genetics vol. 133 1199–1215 Preprint at https://doi.org/10.1007/s00439-014-1476-7 (2014).
    7. Qi, T., Song, L., Guo, Y., Chen, C. & Yang, J. From genetic associations to genes: methods, applications, and challenges. Trends in Genetics Preprint at https://doi.org/10.1016/j.tig.2024.04.008 (2024).
    8. Nam, Y. et al. Harnessing Artificial Intelligence in Multimodal Omics Data Integration: Paving the Path for the Next Frontier in Precision Medicine. Annual Review of Biomedical Data Science Downloaded from www.annualreviews.org. Guest (2024) doi:10.1146/annurev-biodatasci-102523.
    9. Shuni, W. X. Y. Q. Z. S. M. Gene–environment interaction analysis via deep learning. Genetic Epidemiology (2023).
    10. Mills, M. C. & Rahal, C. The GWAS Diversity Monitor tracks diversity by disease in real time. Nat Genetdoi:10.5281/zenodo.3600471.
    11. Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet51, 584–591 (2019).
    12. Westerman, K. E. & Sofer, T. Many roads to a gene-environment interaction. American Journal of Human Genetics vol. 111 626–635 Preprint at https://doi.org/10.1016/j.ajhg.2024.03.002 (2024).
    13. Genovese, G. et al. A risk allele for focal segmental glomerulosclerosis in African Americans is located within a region containing APOL1 and MYH9. Kidney Int 78, 698–704 (2010).
    14. Cohen, J. et al. Low LDL cholesterol in individuals of African descent resulting from frequent nonsense mutations in PCSK9. Nat Genet 37, 161–165 (2005).
    15. Krassowski, M., Das, V., Sahu, S. K. & Misra, B. B. State of the Field in Multi-Omics Research: From Computational Needs to Data Mining and Sharing. Frontiers in Genetics vol. 11 Preprint at https://doi.org/10.3389/fgene.2020.610798 (2020).
    16. Wekesa, J. S. & Kimwele, M. A review of multi-omics data integration through deep learning approaches for disease diagnosis, prognosis, and treatment. Frontiers in Genetics vol. 14 Preprint at https://doi.org/10.3389/fgene.2023.1199087 (2023).
    17. Spanbauer, C. & Sparapani, R. Nonparametric machine learning for precision medicine with longitudinal clinical trials and Bayesian additive regression trees with mixed models. Stat Med 40, 2665–2691 (2021).
    18. The International Warfarin Pharmacogenetics Consortium. Estimation of the Warfarin Dose with Clinical and Pharmacogenetic Data. New England Journal of Medicine 360, 753–764 (2009).
    19. Olopade, O. I., Grushko, T. A., Nanda, R. & Huo, D. Advances in breast cancer: Pathways to personalized medicine. Clinical Cancer Research vol. 14 7988–7999 Preprint at https://doi.org/10.1158/1078-0432.CCR-08-1211 (2008).