Introducing scCross: a deep generative model for unifying single-cell multi-omics with seamless integration, cross-modal generation, and in-silico exploration

Introducing scCross: a deep generative model for integrating single-cell multi-omics with seamless cross-modal generation and exploration. scCross enables efficient cross-modal data generation, multi-omic data simulation, and in-silico perturbations within and across different modalities.
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

The introduction of single-cell sequencing technology marks a new era in biological research, allowing scientists to analyze cellular heterogeneity with unprecedented detail. This advancement reveals complex cellular dynamics and has had significant impacts on fields such as cancer biology, neurobiology, and drug discovery. However, the data generated by these technologies are often highly complex and diverse, leading many existing computational tools to provide a limited perspective focused on specific data modalities. This limitation hinders a comprehensive understanding of the cellular landscape.

Challenges in Multi-Omics Data Integration and Generation

Integrating single-cell multi-omics data effectively remains a significant challenge in the field. Many existing methods depend on matched multi-omics datasets, which are often difficult to obtain, limiting the scope of analyses. These limitations result in insufficient integration of unmatched data and difficulties in managing noise and information loss. Even methods designed to handle multiple data modalities face persistent challenges, such as extracting common features across modalities and managing nonlinear transformations. The imbalance in the availability of different omics data types further complicates this issue; for instance, single-cell epigenomics data is often far less accessible compared to its transcriptomics counterparts. This scarcity not only hinders multi-omics analysis but also limits the potential for discovering comprehensive biological insights. These challenges highlight the need for more robust and flexible approaches to multi-omics data integration and generation, capable of overcoming the existing gaps and limitations in the field.

Developing Integrated Methods for Integration, Generation, Perturbation, and Downstream Analysis

To address these challenges, we propose scCross. This method excels in integrating single-cell multi-omics data and is particularly unique for its ability to generate cross-modal single-cell data. This capability bridges rich and scarce data modalities, allowing for a more comprehensive depiction of cellular states. Another key feature of scCross is its high-fidelity simulation of single-cell multi-omics data and support for computational perturbations. This enables virtual experiments of cellular interventions based on data integration, exploring potential strategies for cellular manipulation. By offering deep insights into cross-modal cellular dynamics, scCross not only enhances the utility of single-cell multi-omics research but also drives innovation and development in the field.

Integrating Multi-Omics Using Deep Generative Frameworks

The scCross model for integrating and generating single-cell multi-omics data leverages a deep generative framework that combines variational autoencoders (VAEs) and generative adversarial networks (GANs). This framework facilitates the seamless integration of single-cell multi-omics data, cross-modal data generation, multi-omics data simulation, and computational perturbations within and across modalities. The process begins by training VAEs for each modality to capture low-dimensional cell embeddings, enriched with gene set vectors for additional informational depth. These embeddings are then integrated into a common latent space, with a Jensen-Shannon (JS) divergence loss applied to minimize differences in data distributions across various omics. GANs are subsequently employed to fuse the modalities within this joint latent space. To further refine the integration, mutually nearest neighbor (MNN) cell pairs are used as anchors, guiding the alignment process and ensuring that embeddings of the same or similar cells across different modalities remain close in the joint latent space. This MNN-guided alignment results in a coordinated integration and distribution of modal data, ensuring robust and accurate multi-omics data integration.

Cross-Modal Generation Using Bidirectional Alignment

Beyond the integration of single-cell multi-omics data, the model also enables cross-modal single-cell data generation and perturbations. The bidirectional aligner is essential for this process, decoding shared latent embeddings into different modalities. Once trained, the model can generate single-cell data across modalities by encoding data from one modality into the latent space and then decoding it into another. Additionally, it simulates multi-omics data generation and performs computational perturbations both within and across modalities, uncovering potential regulatory changes in cellular states. By consolidating single-cell multi-omics data into a unified latent space and supporting cross-modal integration, scCross lays the foundation for a wide range of single-cell multi-omics applications, particularly in scenarios where certain omics data are limited or unavailable. 

Fig. 1: Overview of the scCross method. scCross employs variational autoencoders for each modality to capture latent cell embeddings for different single-cell omics. During single-cell data integration, the method incorporates biological priors, such as gene set matrices, as additional features. It then uses additional variational autoencoders and a bidirectional aligner to merge these enriched embeddings into a shared latent space z. The bidirectional aligner is crucial for cross-modal generation, with brown arrows indicating the transition from scRNA-seq to scATAC-seq. Mutual nearest neighbor priors ensure alignment accuracy. A discriminator maintains integration across omics while ensuring the generated data’s completeness and consistency. scCross provides a robust toolkit for single-cell data integration, supporting cross-modal data generation, single-cell data enhancement, multi-omics simulation, and computational perturbations, offering great flexibility in addressing various single-cell multi-omics challenges.

Validation of scCross

We validated scCross across diverse datasets encompassing various single-cell omics. The results indicate that scCross performs effectively in single-cell multi-omics data integration, cross-modal generation, multi-modal simulation, and computational perturbation tasks, as confirmed by multiple metrics and downstream analyses. These findings suggest that scCross is a valuable tool for facilitating single-cell multi-omic explorations and enhancing data utilization, supporting researchers in gaining deeper insights into single-cell multi-omics and cross-modal cellular dynamics.

Conclusion

The scCross method offers significant potential for the single-cell research community, addressing challenges that may be difficult to overcome with existing approaches. Its unique features and reliable performance make it a valuable tool for researchers engaged in single-cell multi-omics analysis. scCross facilitates the integration of different modalities, supports comprehensive data generation, and enables detailed simulation and perturbation, which could advance the study of complex biological systems. We encourage researchers to explore scCross and consider its application in their studies. For further details, please refer to our paper in Genome Biology (https://doi.org/10.1186/s13059-024-03338-z). 

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Biomedical Engineering and Bioengineering
Technology and Engineering > Biological and Physical Engineering > Biomedical Engineering and Bioengineering
Genetics and Genomics
Life Sciences > Biological Sciences > Genetics and Genomics
Biotechnology
Life Sciences > Biological Sciences > Biotechnology
Computational Intelligence
Technology and Engineering > Mathematical and Computational Engineering Applications > Computational Intelligence

Related Collections

With collections, you can get published faster and increase your visibility.

Transposable elements in genome evolution

Genome Biology is calling for submissions to our Collection on transposons and related genetic elements and their role in shaping genome evolution.Transposable elements have emerged as integral contributors to genomic diversity, adaptation, and evolution. Their mobility shapes the architecture of genomes, influencing gene regulation, genome size, and ultimately, the phenotypic variability within populations. Unraveling the dynamics of transposons provides insights into the evolutionary forces governing the development and adaptation of organisms over time. This collection aims to showcase the latest research on the role of transposons and related genetic elements in shaping genome evolution. Topics accepted for submission include, but are not limited to, the following: Evolutionary dynamics of transposon families within genomes; Evolution of regulatory roles of transposons in gene expression; Comparative genomics highlighting transposon-mediated evolutionary events; Co-evolution between transposons and host genomes; Ecological and environmental influences on transposon evolution; Transposon evolution and disease; Novel computational approaches for studying transposon dynamics; Retrophylogenomics;

Publishing Model: Open Access

Deadline: Mar 21, 2025

Genome editing and cancer

Genome Biology is calling for submissions to our Collection on key genetic determinants and functional consequences of genome editing interventions in the context of cancer. Understanding the molecular pathways essential for cancer progression is of paramount importance in advancing both basic research and clinical applications. By employing CRISPR/Cas9 and related methods for targeted genome editing and high-throughput screens, researchers can systematically dissect the genetic determinants of cancer, identify novel therapeutic targets, and enhance our comprehension of the regulatory networks orchestrating oncogenic processes. This brings insights into the functional consequences of genetic alterations, paving the way for the development of personalized and effective cancer treatments. Through a genome-wide exploration using CRISPR and other editing tools, the Collection aims to offer unprecedented insights into the functional genomic landscape of cancer. It provides a platform for cutting-edge research at the intersection of genome editing technology and cancer biology to advance our understanding of cancer progression by elucidating molecular pathways, identifying key genetic determinants, and exploring the functional consequences of genome editing interventions. Topics of interest include but are not limited to, the following: Functional genomics approaches and insights; Exploration of tumor suppressor genes and oncogenes; Immune modulation using CRISPR/Cas9 or other editing tools; Combinatorial screens profiling genetic interactions and analytical tools designed for these screens; Targeting the undruggable genome and exploration of novel drug targets; This Collection supports and amplifies research related to SDG 3: Good Health and Well-being.

Publishing Model: Open Access

Deadline: Feb 08, 2025