Single-cell RNA sequencing (scRNA-seq) has revolutionized our understanding of cellular diversity and dynamics in development and disease. However, scRNA-seq data only capture a snapshot of gene expression at a given time point, which limits our ability to understand the past and future cell states and trajectories. RNA velocity, a computational method that leverages the kinetics of spliced and unspliced RNAs, has been a powerful tool to overcome this limitation and predict the direction and speed of cell-fate transitions. However, RNA velocity relies on several assumptions that may not hold in complex biological systems. Moreover, RNA velocity may be affected by technical limitations such as the sparsity and bias of scRNA-seq data.
In our study, we introduce PhyloVelo, a novel computational framework that estimates the velocity of transcriptomic dynamics by using monotonically expressed genes (MEGs): genes that have expression patterns that either increase or decrease, along the phylogenetic time of single cells. Phylogenetic time is defined as the number of cell divisions or mutations that separate a cell from a common progenitor. By integrating scRNA-seq data with lineage information, such as CRISPR/Cas9-based lineage tracing or static barcoding, PhyloVelo identifies MEGs and reconstructs a transcriptomic velocity field that points to the past states of cells on low-dimensional embedding. PhyloVelo can be applied to diverse biological contexts such as normal development, tumor evolution and immune response.
The idea of PhyloVelo originated from an observation that cells with closer lineage relationships usually have similar expressions for some genes, even if their cell types may differ. Therefore, we made a reasonable assumption that gene expression patterns can be divided into two types: cell type-related genes and cell type-unrelated genes, and the cell type-unrelated genes are highly correlated with their lineage relationships, among which genes with specific expression patterns (such as MEGs in this article) can serve as “clocks” to indicate the developmental state of cells. However, gene expression data usually have high noise, which makes it difficult for these “clock” genes to be identified. To this end, we use zero-inflated negative binomial distribution, negative binomial distribution, and Gaussian distribution to denoise gene expression data, and the gene expression after denoising is called potential expression.
To test our hypothesis, we used a diffusion process with drift to model the variation of latent expression of each gene with phylogenetic time, and estimated its drift coefficient as the velocity of the gene. The velocity field of cell differentiation can be constructed by accurately identifying enough MEGs. To test the model, we first applied PhyloVelo to simulation data where various lineage structures were considered, including linear, bifurcated and convergent differentiations. We found that PhyloVelo could robustly identify MEGs and accurately recover the expected trajectories in all scenarios. We also benchmarked PhyloVelo with real lineage tracing data from C. elegans embryos, where the invariant lineage tree is entirely known. We found that PhyloVelo outperformed RNA velocity in recapitulating the actual developmental orders.
We then applied PhyloVelo to several CRISPR/Cas9-based lineage tracing datasets from mouse or human, including mouse embryo development, in vitro hematopoiesis, lung tumor evolution and intratumoral T cell dynamics. We demonstrated that PhyloVelo could resolve complex lineage structures and quantify state-transition probabilities between cell types. We also showed that PhyloVelo could circumvent the multiple-rate kinetics issue of RNA velocity and transfer MEGs across independent datasets in similar biological conditions. Interestingly, we found that MEGs across tissues and organisms were strongly enriched in ribosome-mediated processes, suggesting an internal clock-like gene expression program during cell proliferation and differentiation.
Some of the challenges we faced during this project were related to the quality and availability of single-cell lineage tracing data. For example, we had to deal with low coverage and high dropout rates of scRNA-seq data, which could affect the identification of MEGs and the estimation of phylogenetic time. We also had to account for different sources of noise and variation in lineage tracing methods, such as off-target effects of CRISPR/Cas9 editing or stochasticity of barcode evolution. To address these issues, we used very stringent conditions to screen MEGs. We also used simulated data to show that PhyloVelo is still very robust under lower quality datasets.
We believe that PhyloVelo provides a novel perspective and a complementary approach to RNA velocity for studying cellular dynamics from lineage-resolved scRNA-seq data. By exploiting both transcriptomic and lineage information, PhyloVelo can reveal hidden features of cell fate transitions that are otherwise difficult to capture by gene expression alone. We hope that PhyloVelo will facilitate new discoveries in developmental biology and disease progression.