From overwhelmed researchers to a roadmap for the field
When we set out to write this review, we weren't planning to create just another methods paper. We were responding to a problem we kept encountering: brilliant experimental biologists sitting on treasure troves of bulk RNA-seq data, but paralyzed by the sheer number of computational options available to analyze it.
Over the past 15 years, we have developed several deconvolution tools (DeMix, DeMixT, and DeMixSC). Through this work, we've watched the field explode from a handful of methods to more than 40 different approaches. Each new tool promised to solve a specific problem, but collectively, they created a new one: how do researchers choose?
The "apples to oranges" problem
The turning point came during conversations with collaborators who would ask variations of the same question: "Which deconvolution method should I use?" What struck us was that this seemingly simple question had no simple answer, not because the methods were poorly designed, but because researchers were often comparing fundamentally different tools designed for fundamentally different purposes.
A method optimized for mapping immune infiltration in blood samples might be poorly suited for dissecting the complex interplay between tumor cells and their microenvironment. Methods validated on stable, healthy tissues often fail when confronted with the messy reality of cancer, where tumor cells shift states, fibroblasts activate in different ways from normal tissue, and cellular plasticity violates core assumptions of many algorithms.
Bridging transcriptomics and tumor evolution
This collaboration brought together complementary perspectives that proved essential. Dr. Peter Van Loo's extensive work on tumor evolution, subclonal reconstruction, and copy number analysis provided crucial insights into why cancer is fundamentally different from other tissues when it comes to deconvolution.
From the tumor evolution perspective, we understand that cancer isn't just heterogeneous; it's actively evolving. Subclones emerge, compete, and are shaped by treatment and immune pressure. The transcriptional states we're trying to deconvolve aren't static snapshots but moving targets. A tumor sample might contain multiple subclones, each with distinct transcriptional programs, all mixed together with stromal and immune components.
This insight fundamentally shaped our approach. We couldn't simply adapt methods from immunology or developmental biology. We needed to address the unique challenge that tumor cells don't fit neatly into predefined cell-type boxes. They exist along continua of states, driven by both intrinsic evolutionary processes and extrinsic microenvironmental pressures.
One critical insight from integrating genomics and transcriptomics perspectives was the underexplored role of somatic copy number alterations. Copy number changes are ubiquitous in cancer and directly impact gene expression levels. Yet most deconvolution methods don't account for this. When a genomic region is amplified in tumor cells, those genes show elevated expression, but is that due to more tumor cells or just more gene copies per cell? This realization highlighted an important future direction: integrating multi-omic data to build more robust deconvolution models.
Building a framework for complex biology
Rather than cataloging methods, we built a decision framework that accounts for cancer's biological complexity. We started with fundamental questions: What are you trying to learn? Are you mapping the tumor microenvironment? Identifying tumor subtypes? Extracting tumor-specific expression profiles? Understanding how cellular composition changes with treatment?
Each goal requires different approaches and must account for different aspects of cancer biology. For immune profiling, reference-based methods work well because immune cell types are relatively stable and well-characterized. But for tumor cells, which can be highly plastic and exist in transitional states, we need more flexible approaches like semi-reference-based methods that don't assume stability.
The evolutionary perspective also informed our emphasis on temporal deconvolution as a future direction. To understand treatment resistance or disease progression, we need methods that can track how cellular composition evolves over time, potentially integrating both transcriptomic and genomic changes.
Two critical gaps.
As we surveyed the field, two major gaps became apparent.
First, benchmarking data. Most methods are benchmarked on simulated data or non-cancer tissues. We urgently need more cancer-specific benchmark datasets with ground truth, ideally using matched bulk and single-cell data from actual tumors, and paired with genomic characterization. Benchmarks should include tumors with different mutational landscapes, copy number profiles, and clonal architectures to test method robustness.
Second, tumor-specific deconvolution. While immune profiling has advanced tremendously, tools for accurately characterizing tumor cells and stromal components lag behind. This is problematic because tumor-intrinsic features, shaped by both genomic alterations and transcriptional plasticity, often drive patient outcomes. Recent work applying deconvolution to clinical cohorts illustrates this potential: tumor-specific total mRNA expression (TmS), derived from joint bulk RNA/DNA-seq deconvolution, stratified chemotherapy response in triple-negative breast cancer across four multi-ethnic cohorts, revealing population-specific tumor microenvironment mechanisms that existing molecular subtypes failed to capture (Dai et al., Cell Reports Medicine 7, 102610, 2026). Such applications underscore the need for more tools built specifically for tumor cell characterization at scale.
Looking forward
The field continues to evolve. We're excited about several emerging directions that build on both transcriptomic and genomic foundations.
Spatial integration combines deconvolution with spatial transcriptomics to reveal not just what cells are present, but also how they're organized. This spatial context is crucial for understanding cell-cell interactions that drive tumor behavior.
Multi-omic integration incorporating copy number alterations, mutational profiles, and epigenetic data alongside transcriptomics can provide more robust characterization of tumor states, particularly for distinguishing genomically-driven expression changes from microenvironment-driven ones.
Temporal dynamics methods will track how cellular composition and tumor evolution co-occur during treatment, essential for understanding resistance mechanisms.
FFPE adaptation will unlock decades of annotated clinical specimens with long-term follow-up data, essential for validating clinical utility.
Perhaps most importantly, we need to make these methods accessible to researchers without extensive computational training. By providing clear guidance grounded in understanding of both computational methods and cancer biology, we hope to empower the broader cancer research community.
A living resource
This review represents our attempt to impose order on a rapidly evolving field while acknowledging its fundamental complexity. We hope it serves as both a current resource and a framework for evaluating future developments, helping researchers move from feeling overwhelmed to feeling empowered to make informed decisions.
The journey from developing individual tools to synthesizing field-wide guidance reinforced that solving cancer's complexity requires not just better algorithms, but better integration of computational innovation with biological understanding and clinical insight.