Introducing scCross: a deep generative model for unifying single-cell multi-omics with seamless integration, cross-modal generation, and in-silico exploration

Introducing scCross: a deep generative model for integrating single-cell multi-omics with seamless cross-modal generation and exploration. scCross enables efficient cross-modal data generation, multi-omic data simulation, and in-silico perturbations within and across different modalities.
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

The introduction of single-cell sequencing technology marks a new era in biological research, allowing scientists to analyze cellular heterogeneity with unprecedented detail. This advancement reveals complex cellular dynamics and has had significant impacts on fields such as cancer biology, neurobiology, and drug discovery. However, the data generated by these technologies are often highly complex and diverse, leading many existing computational tools to provide a limited perspective focused on specific data modalities. This limitation hinders a comprehensive understanding of the cellular landscape.

Challenges in Multi-Omics Data Integration and Generation

Integrating single-cell multi-omics data effectively remains a significant challenge in the field. Many existing methods depend on matched multi-omics datasets, which are often difficult to obtain, limiting the scope of analyses. These limitations result in insufficient integration of unmatched data and difficulties in managing noise and information loss. Even methods designed to handle multiple data modalities face persistent challenges, such as extracting common features across modalities and managing nonlinear transformations. The imbalance in the availability of different omics data types further complicates this issue; for instance, single-cell epigenomics data is often far less accessible compared to its transcriptomics counterparts. This scarcity not only hinders multi-omics analysis but also limits the potential for discovering comprehensive biological insights. These challenges highlight the need for more robust and flexible approaches to multi-omics data integration and generation, capable of overcoming the existing gaps and limitations in the field.

Developing Integrated Methods for Integration, Generation, Perturbation, and Downstream Analysis

To address these challenges, we propose scCross. This method excels in integrating single-cell multi-omics data and is particularly unique for its ability to generate cross-modal single-cell data. This capability bridges rich and scarce data modalities, allowing for a more comprehensive depiction of cellular states. Another key feature of scCross is its high-fidelity simulation of single-cell multi-omics data and support for computational perturbations. This enables virtual experiments of cellular interventions based on data integration, exploring potential strategies for cellular manipulation. By offering deep insights into cross-modal cellular dynamics, scCross not only enhances the utility of single-cell multi-omics research but also drives innovation and development in the field.

Integrating Multi-Omics Using Deep Generative Frameworks

The scCross model for integrating and generating single-cell multi-omics data leverages a deep generative framework that combines variational autoencoders (VAEs) and generative adversarial networks (GANs). This framework facilitates the seamless integration of single-cell multi-omics data, cross-modal data generation, multi-omics data simulation, and computational perturbations within and across modalities. The process begins by training VAEs for each modality to capture low-dimensional cell embeddings, enriched with gene set vectors for additional informational depth. These embeddings are then integrated into a common latent space, with a Jensen-Shannon (JS) divergence loss applied to minimize differences in data distributions across various omics. GANs are subsequently employed to fuse the modalities within this joint latent space. To further refine the integration, mutually nearest neighbor (MNN) cell pairs are used as anchors, guiding the alignment process and ensuring that embeddings of the same or similar cells across different modalities remain close in the joint latent space. This MNN-guided alignment results in a coordinated integration and distribution of modal data, ensuring robust and accurate multi-omics data integration.

Cross-Modal Generation Using Bidirectional Alignment

Beyond the integration of single-cell multi-omics data, the model also enables cross-modal single-cell data generation and perturbations. The bidirectional aligner is essential for this process, decoding shared latent embeddings into different modalities. Once trained, the model can generate single-cell data across modalities by encoding data from one modality into the latent space and then decoding it into another. Additionally, it simulates multi-omics data generation and performs computational perturbations both within and across modalities, uncovering potential regulatory changes in cellular states. By consolidating single-cell multi-omics data into a unified latent space and supporting cross-modal integration, scCross lays the foundation for a wide range of single-cell multi-omics applications, particularly in scenarios where certain omics data are limited or unavailable. 

Fig. 1: Overview of the scCross method. scCross employs variational autoencoders for each modality to capture latent cell embeddings for different single-cell omics. During single-cell data integration, the method incorporates biological priors, such as gene set matrices, as additional features. It then uses additional variational autoencoders and a bidirectional aligner to merge these enriched embeddings into a shared latent space z. The bidirectional aligner is crucial for cross-modal generation, with brown arrows indicating the transition from scRNA-seq to scATAC-seq. Mutual nearest neighbor priors ensure alignment accuracy. A discriminator maintains integration across omics while ensuring the generated data’s completeness and consistency. scCross provides a robust toolkit for single-cell data integration, supporting cross-modal data generation, single-cell data enhancement, multi-omics simulation, and computational perturbations, offering great flexibility in addressing various single-cell multi-omics challenges.

Validation of scCross

We validated scCross across diverse datasets encompassing various single-cell omics. The results indicate that scCross performs effectively in single-cell multi-omics data integration, cross-modal generation, multi-modal simulation, and computational perturbation tasks, as confirmed by multiple metrics and downstream analyses. These findings suggest that scCross is a valuable tool for facilitating single-cell multi-omic explorations and enhancing data utilization, supporting researchers in gaining deeper insights into single-cell multi-omics and cross-modal cellular dynamics.

Conclusion

The scCross method offers significant potential for the single-cell research community, addressing challenges that may be difficult to overcome with existing approaches. Its unique features and reliable performance make it a valuable tool for researchers engaged in single-cell multi-omics analysis. scCross facilitates the integration of different modalities, supports comprehensive data generation, and enables detailed simulation and perturbation, which could advance the study of complex biological systems. We encourage researchers to explore scCross and consider its application in their studies. For further details, please refer to our paper in Genome Biology (https://doi.org/10.1186/s13059-024-03338-z). 

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Biomedical Engineering and Bioengineering
Technology and Engineering > Biological and Physical Engineering > Biomedical Engineering and Bioengineering
Genetics and Genomics
Life Sciences > Biological Sciences > Genetics and Genomics
Biotechnology
Life Sciences > Biological Sciences > Biotechnology
Computational Intelligence
Technology and Engineering > Mathematical and Computational Engineering Applications > Computational Intelligence

Related Collections

With Collections, you can get published faster and increase your visibility.

Tackling large-scale genomic studies

Genome Biology is calling for submissions to our Collection on large-scale genomic studies. Modern genomic research now generates datasets of unprecedented scale spanning population cohorts, large single‑cell atlases, and high‑throughput multi‑omics studies. These expansive datasets offer powerful opportunities to uncover biological mechanisms, but they also introduce major challenges in data management and interpretation. Addressing these demands requires scalable approaches capable of extracting meaningful insight from large and heterogeneous genomic resources.

This Collection invites contributions that advance the design, execution, and interpretation of large‑scale genomic studies, including:

  • Experimental and sequencing strategies optimised for high‑throughput, population‑scale data generation
  • Frameworks for data harmonisation and standardisation, enabling cross‑study comparability, meta‑analysis, and integration of datasets generated across platforms, cohorts, or populations
  • Scalable machine learning and AI approaches designed for high‑throughput genomic data
  • New research with biological insights derived from large‑scale genomic analyses

All manuscripts submitted to this journal, including those submitted to collections and special issues, are assessed in line with our editorial policies and the journal’s peer review process. Reviewers and editors are required to declare competing interests and can be excluded from the peer review process if a competing interest exists.

Publishing Model: Open Access

Deadline: Dec 05, 2026

Improving the gene editing toolbox

Genome Biology is calling for submissions to our Collection on recent progress in genome editing technology developments.

Such developments offer exciting opportunities to reshape our approach to understanding biology, driving progress in various fields such as gene therapy, medicine, and agriculture. For example, technologies including base editing or prime editing enable users to install targeted gene modifications via single base substitutions or nearly any short sequence edits, respectively, CRISPRa and CRISPRi can regulate gene expression, various emerging technologies are being explored for exon- or kilobase-scale sequence edits, and the use of AAVs offers potential for studying effects of perturbations in vivo in mammalians. As computational approaches improve, larger-scale CRISPR screens paired with high-dimensional read-outs present the potential to map gene interactions with enhanced accuracy. These breakthroughs are reshaping biotechnology, driving progress in medicine, agriculture, and synthetic biology.

This Collection seeks to highlight research aimed at capitalizing on these innovations across various fields, from fundamental research to therapeutic applications and crop improvement.

We welcome contributions that explore:

Exploiting alternative endonucleases to improve efficacy or specificity

New approaches for exon- and kilobase-sized genome edits

Computational methods/models for rational or de novo design of genome editing reagents or prediction of genome editing efficacy

Efficient delivery systems of genome editing reagents in plants and mammals

Large-scale CRISPR-based screens, including effects of combinatorial perturbations and gene function discovery

Single-cell and other high-dimensional readouts, including Perturb-seq, CROP-seq, ATAC-seq, and optical pooled screens

Mechanistic and structural insights into genome editing tools to elucidate their modes of action and guide further optimization

Engineering programmable gene circuits and synthetic regulatory systems using genome editing platforms

Development of highly specific, efficient, and durable tools for targeted epigenomic editing

Expanding the scope and applications of RNA-targeted editing technologies

We also welcome submissions on any other innovative research that contributes to the advancement of genome editing technologies, even if not explicitly listed above. Authors are encouraged to contact the editors for pre-submission inquiries regarding topic suitability.

All manuscripts submitted to this journal, including those submitted to collections and special issues, are assessed in line with our editorial policies and the journal’s peer review process. Reviewers and editors are required to declare competing interests and can be excluded from the peer review process if a competing interest exists.

Publishing Model: Open Access

Deadline: Jun 23, 2026