Pediatric leukemia is a highly heterogeneous disease characterized by alterations across multiple molecular layers. While each layer has been extensively studied, their interplay remains poorly understood. In this study, we aimed to explore how molecular and drug response profiles jointly influence clinical outcomes in childhood leukemia.
What do we know about leukemia?
Childhood leukemia is a devastating disease that affects children from infancy through adolescence. Not long ago, most children diagnosed with leukemia did not survive, due to both the aggressive nature of the disease and the lack of effective treatment.
Leukemia is driven by large-scale chromosomal abnormalities, including changes in chromosomal number and structural alterations. These events give rise to a wide spectrum of disease subtypes and influence multiple biological layers, such as gene expression and epigenetic regulation.
What has changed over the years?
Today, most children with leukemia survive, largely due to the risk-adapted treatment strategies. These advances have been made possible by integrating clinical variables, such as age and white blood cell count, with molecular information, including leukemia subtypes and minimal residual disease.
Traditional diagnostic methods like FISH, karyotyping, and RT-PCR have been complemented by next-generation sequencing technologies, such as RNA sequencing and whole-genome sequencing. These approaches have enabled more precise characterization of patient profiles that do not fit into the conventional diagnostic categories.
Challenges in leukemia research
Despite these advances, some patients experience relapses, develop treatment resistance, or die from therapy-related toxicities. The multifactorial nature of leukemia necessitates integrating multiple layers of biological information. A major challenge is that multi-omics datasets are often incomplete, as not all data types are available for every patient. This limits statistical power and hinders integrative analysis.
Multi-omics integration in pediatric ALL
Our research group has generated one of the largest multi-omics datasets in pediatric leukemia, comprising of gene expression, DNA methylation, mutational, and drug response data from over 1000 patients diagnosed and treated in the Nordic countries.
Although the dataset is imbalanced due to missing data across modalities, we applied an integrative framework (MOFA) that allows the inclusion of all available patient data. Similar to principal component analysis, we used MOFA to generate ten latent factors, referring to cross-modal elements (CMEs).
Multi-omics analyses identify biologically and clinically relevant information
Each CME is defined by a unique combination of top-ranked signatures linked to specific biological pathways. In some cases, pathways are shared across CMEs, highlighting the complex and interconnected nature of leukemia biology.
Additionally, we identified a small set of DNA methylation sites positively associated with response to the chemotherapeutic drug doxorubicin. Using only 17 methylation markers, we were able to stratify patients into two distinct groups.
Among these patients, within a typically favorable subtype (high hyperdiploidy), around 30% of patients belonged to a high-methylation group associated with poorer outcomes. This association remained significant even after adjusting for established clinical factors such as risk group and treatment protocol.
Finally, incorporation of top-ranked drug response data from several CMEs improved the predictive performance of baseline clinical models.
Altogether, this study advances our understanding of outcome heterogeneity in pediatric leukemia.
What do these findings mean for the future?
While single-omics studies have provided valuable insights into leukemia biology, multi-omics integration offers a more comprehensive view by bridging information across molecular layers.
Leukemia, like many cancers, does not operate within isolated biological systems. Interactions between tumor cells and their microenvironment, and across molecular modalities, play a critical role in disease progression and treatment response.
Although collecting multiple data modalities in clinical settings is often costly, future approaches may include synthetic data generation or methods that integrate partially overlapping datasets (e.g., IntegrAO, Ma et al. 2025).
To facilitate reproducible research and enable groups to compare their findings with ours, we aim to develop a user-friendly web application for this data repository.