Behind the Paper

Introducing scCross: a deep generative model for unifying single-cell multi-omics with seamless integration, cross-modal generation, and in-silico exploration

Introducing scCross: a deep generative model for integrating single-cell multi-omics with seamless cross-modal generation and exploration. scCross enables efficient cross-modal data generation, multi-omic data simulation, and in-silico perturbations within and across different modalities.

Published in Bioengineering & Biotechnology, Genetics & Genomics, and Mathematical & Computational Engineering Applications

Aug 11, 2024

Xiuhui Yang

PhD Student, McGill University

Liked by India Ambler and 2 others

Explore the Research

The introduction of single-cell sequencing technology marks a new era in biological research, allowing scientists to analyze cellular heterogeneity with unprecedented detail. This advancement reveals complex cellular dynamics and has had significant impacts on fields such as cancer biology, neurobiology, and drug discovery. However, the data generated by these technologies are often highly complex and diverse, leading many existing computational tools to provide a limited perspective focused on specific data modalities. This limitation hinders a comprehensive understanding of the cellular landscape.

Challenges in Multi-Omics Data Integration and Generation

Integrating single-cell multi-omics data effectively remains a significant challenge in the field. Many existing methods depend on matched multi-omics datasets, which are often difficult to obtain, limiting the scope of analyses. These limitations result in insufficient integration of unmatched data and difficulties in managing noise and information loss. Even methods designed to handle multiple data modalities face persistent challenges, such as extracting common features across modalities and managing nonlinear transformations. The imbalance in the availability of different omics data types further complicates this issue; for instance, single-cell epigenomics data is often far less accessible compared to its transcriptomics counterparts. This scarcity not only hinders multi-omics analysis but also limits the potential for discovering comprehensive biological insights. These challenges highlight the need for more robust and flexible approaches to multi-omics data integration and generation, capable of overcoming the existing gaps and limitations in the field.

Developing Integrated Methods for Integration, Generation, Perturbation, and Downstream Analysis

To address these challenges, we propose scCross. This method excels in integrating single-cell multi-omics data and is particularly unique for its ability to generate cross-modal single-cell data. This capability bridges rich and scarce data modalities, allowing for a more comprehensive depiction of cellular states. Another key feature of scCross is its high-fidelity simulation of single-cell multi-omics data and support for computational perturbations. This enables virtual experiments of cellular interventions based on data integration, exploring potential strategies for cellular manipulation. By offering deep insights into cross-modal cellular dynamics, scCross not only enhances the utility of single-cell multi-omics research but also drives innovation and development in the field.

Integrating Multi-Omics Using Deep Generative Frameworks

The scCross model for integrating and generating single-cell multi-omics data leverages a deep generative framework that combines variational autoencoders (VAEs) and generative adversarial networks (GANs). This framework facilitates the seamless integration of single-cell multi-omics data, cross-modal data generation, multi-omics data simulation, and computational perturbations within and across modalities. The process begins by training VAEs for each modality to capture low-dimensional cell embeddings, enriched with gene set vectors for additional informational depth. These embeddings are then integrated into a common latent space, with a Jensen-Shannon (JS) divergence loss applied to minimize differences in data distributions across various omics. GANs are subsequently employed to fuse the modalities within this joint latent space. To further refine the integration, mutually nearest neighbor (MNN) cell pairs are used as anchors, guiding the alignment process and ensuring that embeddings of the same or similar cells across different modalities remain close in the joint latent space. This MNN-guided alignment results in a coordinated integration and distribution of modal data, ensuring robust and accurate multi-omics data integration.

Cross-Modal Generation Using Bidirectional Alignment

Beyond the integration of single-cell multi-omics data, the model also enables cross-modal single-cell data generation and perturbations. The bidirectional aligner is essential for this process, decoding shared latent embeddings into different modalities. Once trained, the model can generate single-cell data across modalities by encoding data from one modality into the latent space and then decoding it into another. Additionally, it simulates multi-omics data generation and performs computational perturbations both within and across modalities, uncovering potential regulatory changes in cellular states. By consolidating single-cell multi-omics data into a unified latent space and supporting cross-modal integration, scCross lays the foundation for a wide range of single-cell multi-omics applications, particularly in scenarios where certain omics data are limited or unavailable.

Fig. 1: Overview of the scCross method. scCross employs variational autoencoders for each modality to capture latent cell embeddings for different single-cell omics. During single-cell data integration, the method incorporates biological priors, such as gene set matrices, as additional features. It then uses additional variational autoencoders and a bidirectional aligner to merge these enriched embeddings into a shared latent space z. The bidirectional aligner is crucial for cross-modal generation, with brown arrows indicating the transition from scRNA-seq to scATAC-seq. Mutual nearest neighbor priors ensure alignment accuracy. A discriminator maintains integration across omics while ensuring the generated data’s completeness and consistency. scCross provides a robust toolkit for single-cell data integration, supporting cross-modal data generation, single-cell data enhancement, multi-omics simulation, and computational perturbations, offering great flexibility in addressing various single-cell multi-omics challenges.

Validation of scCross

We validated scCross across diverse datasets encompassing various single-cell omics. The results indicate that scCross performs effectively in single-cell multi-omics data integration, cross-modal generation, multi-modal simulation, and computational perturbation tasks, as confirmed by multiple metrics and downstream analyses. These findings suggest that scCross is a valuable tool for facilitating single-cell multi-omic explorations and enhancing data utilization, supporting researchers in gaining deeper insights into single-cell multi-omics and cross-modal cellular dynamics.

Conclusion

The scCross method offers significant potential for the single-cell research community, addressing challenges that may be difficult to overcome with existing approaches. Its unique features and reliable performance make it a valuable tool for researchers engaged in single-cell multi-omics analysis. scCross facilitates the integration of different modalities, supports comprehensive data generation, and enables detailed simulation and perturbation, which could advance the study of complex biological systems. We encourage researchers to explore scCross and consider its application in their studies. For further details, please refer to our paper in Genome Biology (https://doi.org/10.1186/s13059-024-03338-z).

Xiuhui Yang

PhD Student, McGill University

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Biomedical Engineering and Bioengineering

Technology and Engineering > Biological and Physical Engineering > Biomedical Engineering and Bioengineering

Genetics and Genomics

Life Sciences > Biological Sciences > Genetics and Genomics

Biotechnology

Life Sciences > Biological Sciences > Biotechnology

Computational Intelligence

Technology and Engineering > Mathematical and Computational Engineering Applications > Computational Intelligence

Genome Biology

Genome Biology

This journal publishes outstanding research in all areas of biology and biomedicine studied from a genomic and post-genomic perspective.

More about the journal

Related Collections

With Collections, you can get published faster and increase your visibility.

Genomics and pangenomics for understanding crop domestication

Genome Biology is calling for submissions to our Collection on the use of genomics and pangenomics in the study of crop domestication.

Genomics and pangenomics involve the comprehensive analysis of the genetic makeup in a single individual or within a species. In crops, recent advances in sequencing and assembly technologies accelerate the generation of both high-quality novel single reference genome and multiple reference genomes within a species. This allows for a better understanding of the genetic changes that have occurred during domestication and that impact traits such as yield, disease resistance, and environmental adaptability. Understanding the genetic basis of these domestication traits is crucial for enhancing crop resilience, optimizing growth, and boosting agricultural productivity. Such improvements are vital for ensuring food security and meeting the demands of a growing global population.

Our collection aims to highlight research that utilizes genomics and pangenomics to study crop domestication. We invite submissions that explore the genetic mechanisms underlying domestication, the evolutionary history of crops, and the practical applications of this knowledge in modern agriculture.

All manuscripts submitted to this journal, including those submitted to collections and special issues, are assessed in line with our editorial policies and the journal’s peer review process. Reviewers and editors are required to declare competing interests and can be excluded from the peer review process if a competing interest exists.

Publishing Model: Open Access

Deadline: Mar 28, 2026

Explore this Collection

Benchmarks v2.0

Genome Biology is calling for submissions to our Collection on large-scale studies that evaluate and compare methods and tools in genomics, post-genomics, and bioinformatics.

This Collection invites contributions focused on large-scale studies that benchmark methods and tools in the fields of genomics, post-genomics and bioinformatics. We seek novel research that offers critical evaluations and new insights into the performance, robustness, reproducibility, and limitations of bioinformatics tools and methodologies in genomic, post-genomic and multi-omic fields. We encourage comprehensive and large-scale comparisons providing best practices and guidelines to their users. Suitable studies can also address the standardization of omic data and quality, or critically discuss the suitability of the metrics used for evaluation. By fostering rigorous and reproducible benchmarking practices, this Collection aims to enhance the reliability, extensibility, and applicability of genomic research across various fields, ultimately contributing to advancements in health and innovative technologies.

We welcome submissions on:

Standardization of genomic data quality metrics

Comparative analysis of variant calling methods

Assessment of sequencing technology performance (short- vs long-read, single-cell, spatial, etc.)

Benchmarking bioinformatics tools or AI models for multi-omics and other analyses

Metrics for benchmarking interpretability, scalability, and usability of bioinformatics tools

Critical analysis and discussion of ‘metrics of success’ in the evaluation of computational approaches

Systems and workflows to organize systematic benchmarks

Campaigns to experimentally generate reference datasets, including simulated data

Initiatives that bring together sub-communities to develop standards for method assessment

Publishing Model: Open Access

Deadline: Feb 28, 2026

Explore this Collection

Latest Content

A fuzzy set-based hybrid SWARA-CoCoSo-William Fine framework for safety risk assessment in a ceramic granule preparation unit

Digital Nano-Plastic Science (DNPS) Paradigm: Computational Intelligence and Proteostasis Disruptions

Polybiome Systems Medicine: Conceptual Architecture, Methodological Foundations, and Translational Applications — Volume I: Vision and Foundational Methodology

Comprehensive risk profiling of occupational harmful factors in the ceramic industry: a case study from Iran

How to select the best candidate or the key factors? Hierarchical topological clustering can help

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Introducing scCross: a deep generative model for unifying single-cell multi-omics with seamless integration, cross-modal generation, and in-silico exploration

Share this post

Share with...

...or copy the link