Introducing scCross: a deep generative model for unifying single-cell multi-omics with seamless integration, cross-modal generation, and in-silico exploration

Introducing scCross: a deep generative model for integrating single-cell multi-omics with seamless cross-modal generation and exploration. scCross enables efficient cross-modal data generation, multi-omic data simulation, and in-silico perturbations within and across different modalities.
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

The introduction of single-cell sequencing technology marks a new era in biological research, allowing scientists to analyze cellular heterogeneity with unprecedented detail. This advancement reveals complex cellular dynamics and has had significant impacts on fields such as cancer biology, neurobiology, and drug discovery. However, the data generated by these technologies are often highly complex and diverse, leading many existing computational tools to provide a limited perspective focused on specific data modalities. This limitation hinders a comprehensive understanding of the cellular landscape.

Challenges in Multi-Omics Data Integration and Generation

Integrating single-cell multi-omics data effectively remains a significant challenge in the field. Many existing methods depend on matched multi-omics datasets, which are often difficult to obtain, limiting the scope of analyses. These limitations result in insufficient integration of unmatched data and difficulties in managing noise and information loss. Even methods designed to handle multiple data modalities face persistent challenges, such as extracting common features across modalities and managing nonlinear transformations. The imbalance in the availability of different omics data types further complicates this issue; for instance, single-cell epigenomics data is often far less accessible compared to its transcriptomics counterparts. This scarcity not only hinders multi-omics analysis but also limits the potential for discovering comprehensive biological insights. These challenges highlight the need for more robust and flexible approaches to multi-omics data integration and generation, capable of overcoming the existing gaps and limitations in the field.

Developing Integrated Methods for Integration, Generation, Perturbation, and Downstream Analysis

To address these challenges, we propose scCross. This method excels in integrating single-cell multi-omics data and is particularly unique for its ability to generate cross-modal single-cell data. This capability bridges rich and scarce data modalities, allowing for a more comprehensive depiction of cellular states. Another key feature of scCross is its high-fidelity simulation of single-cell multi-omics data and support for computational perturbations. This enables virtual experiments of cellular interventions based on data integration, exploring potential strategies for cellular manipulation. By offering deep insights into cross-modal cellular dynamics, scCross not only enhances the utility of single-cell multi-omics research but also drives innovation and development in the field.

Integrating Multi-Omics Using Deep Generative Frameworks

The scCross model for integrating and generating single-cell multi-omics data leverages a deep generative framework that combines variational autoencoders (VAEs) and generative adversarial networks (GANs). This framework facilitates the seamless integration of single-cell multi-omics data, cross-modal data generation, multi-omics data simulation, and computational perturbations within and across modalities. The process begins by training VAEs for each modality to capture low-dimensional cell embeddings, enriched with gene set vectors for additional informational depth. These embeddings are then integrated into a common latent space, with a Jensen-Shannon (JS) divergence loss applied to minimize differences in data distributions across various omics. GANs are subsequently employed to fuse the modalities within this joint latent space. To further refine the integration, mutually nearest neighbor (MNN) cell pairs are used as anchors, guiding the alignment process and ensuring that embeddings of the same or similar cells across different modalities remain close in the joint latent space. This MNN-guided alignment results in a coordinated integration and distribution of modal data, ensuring robust and accurate multi-omics data integration.

Cross-Modal Generation Using Bidirectional Alignment

Beyond the integration of single-cell multi-omics data, the model also enables cross-modal single-cell data generation and perturbations. The bidirectional aligner is essential for this process, decoding shared latent embeddings into different modalities. Once trained, the model can generate single-cell data across modalities by encoding data from one modality into the latent space and then decoding it into another. Additionally, it simulates multi-omics data generation and performs computational perturbations both within and across modalities, uncovering potential regulatory changes in cellular states. By consolidating single-cell multi-omics data into a unified latent space and supporting cross-modal integration, scCross lays the foundation for a wide range of single-cell multi-omics applications, particularly in scenarios where certain omics data are limited or unavailable. 

Fig. 1: Overview of the scCross method. scCross employs variational autoencoders for each modality to capture latent cell embeddings for different single-cell omics. During single-cell data integration, the method incorporates biological priors, such as gene set matrices, as additional features. It then uses additional variational autoencoders and a bidirectional aligner to merge these enriched embeddings into a shared latent space z. The bidirectional aligner is crucial for cross-modal generation, with brown arrows indicating the transition from scRNA-seq to scATAC-seq. Mutual nearest neighbor priors ensure alignment accuracy. A discriminator maintains integration across omics while ensuring the generated data’s completeness and consistency. scCross provides a robust toolkit for single-cell data integration, supporting cross-modal data generation, single-cell data enhancement, multi-omics simulation, and computational perturbations, offering great flexibility in addressing various single-cell multi-omics challenges.

Validation of scCross

We validated scCross across diverse datasets encompassing various single-cell omics. The results indicate that scCross performs effectively in single-cell multi-omics data integration, cross-modal generation, multi-modal simulation, and computational perturbation tasks, as confirmed by multiple metrics and downstream analyses. These findings suggest that scCross is a valuable tool for facilitating single-cell multi-omic explorations and enhancing data utilization, supporting researchers in gaining deeper insights into single-cell multi-omics and cross-modal cellular dynamics.

Conclusion

The scCross method offers significant potential for the single-cell research community, addressing challenges that may be difficult to overcome with existing approaches. Its unique features and reliable performance make it a valuable tool for researchers engaged in single-cell multi-omics analysis. scCross facilitates the integration of different modalities, supports comprehensive data generation, and enables detailed simulation and perturbation, which could advance the study of complex biological systems. We encourage researchers to explore scCross and consider its application in their studies. For further details, please refer to our paper in Genome Biology (https://doi.org/10.1186/s13059-024-03338-z). 

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Biomedical Engineering and Bioengineering
Technology and Engineering > Biological and Physical Engineering > Biomedical Engineering and Bioengineering
Genetics and Genomics
Life Sciences > Biological Sciences > Genetics and Genomics
Biotechnology
Life Sciences > Biological Sciences > Biotechnology
Computational Intelligence
Technology and Engineering > Mathematical and Computational Engineering Applications > Computational Intelligence

Related Collections

With collections, you can get published faster and increase your visibility.

Epistasis

Genome Biology is calling for submissions to our Collection on epistatic effects. Epistasis plays a crucial role in shaping the genetic architecture of complex traits. Understanding how gene interactions influence these traits enables more accurate genetic mapping and it enhances our grasp of genetic networks, evolutionary dynamics, and the precision of genomic predictions in personalized medicine. Additionally, research on epistasis informs synthetic biology and genetic engineering by clarifying how gene interactions affect system behavior.

Publishing Model: Open Access

Deadline: May 21, 2025

Genomics for crop improvement

As the global demands for food, fiber, and bioenergy increase, harnessing the power of genomics is important for coping with these challenges sustainably. Advances in sequencing technologies allow for efficient genome assembly for crops with complex ploidy. This generates valuable genomic resources for analyzing the genome at higher resolution, exploring the impact of genetic variants on the phenotypes in the field, elucidating the genetic basis of the agricultural traits, tracing the domestication history, and editing the genome, which provides exciting novel insights into crop genome biology and the opportunities for translating the findings from the lab to the field. This collection is calling for submissions on “Genomics for crop improvement”. We aim to advance our understanding of the crop genomics underlying improvement strategies for breeding.

Publishing Model: Open Access

Deadline: Jan 04, 2025