dualGSEA: A New Tool for Unlocking Insights from Transcriptomic Data

dualGSEA: A New Tool for Unlocking Insights from Transcriptomic Data
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

Introduction:

Advances in sequencing technologies and the increasing availability of open, accessible data have transformed cancer research. Currently, researchers have access to a wide range molecular datasets derived from tumour tissue, alongside tools such as gene set enrichment analysis (GSEA) that help with interpretation of the data. These tools are used in combination with collections of gene signatures to highlight biological pathways involved in disease. While GSEA is great at identifying patterns across groups of samples, single sample enrichment methods often produce inconsistent results compared to pairwise methods when analysing the same data at the individual sample level.

Our study addressed this issue in a colon cancer (CC) patient cohort to look at the biology of relapse vs non-relapse disease. The data was analysed with both  pairwise GSEA and single sample methods and the outputs were compared. We found that most results, while statistically significant in the pairwise methods, did not translate to meaningful biological differences when the same signatures were assessed using single sample methods, as the groups appeared biologically similar. To address this, we created dualGSEA (https://github.com/MolecularPathologyLab/Bull-et-al), a bioinformatics tool that helps researchers compare pairwise and single sample results side by side. By showing both the statistical and biological relevance of findings, dualGSEA makes it easier to identify true biological differences between groups.

Methods overview:

In this study, we investigated the similarity between different gene set enrichment methods by comparing pairwise GSEA and single sample methods. Using a CC transcriptomic dataset, we evaluated the consistency and biological relevance of results from both methods for the same biological gene signatures. The first aim of this study was to evaluate how variations of ranking metrics from differential expression analysis could influence downstream GSEA results and  then compare tools for pairwise GSEA. Additionally, we aimed to determine whether single sample analyses would complement the findings from pairwise methods or reveal differences in the interpretation of results.

Pairwise GSEA was performed using three tools including two R packages, clusterProfiler and fgsea, and the online tool GenePattern’s GSEAPreranked (Figure 1A). From this analysis, three gene signatures that were consistently significant in pairwise analysis were then tested using two single sample methods, single sample GSEA (ssGSEA) and gene set variation analysis (GSVA) (Figure 1B). Visualisation of the single sample results revealed that between the two groups of interest, a high proportion of the samples overlapped. This led us to the development of dualGSEA, a tool that allows researchers to perform both pairwise and single sample analyses and visualise the results to see if the gene signatures are biologically relevant for the samples in the groups of interest.

 

 

Figure 1. Pairwise and single sample workflow. (A) Overview of pairwise analysis and description of samples used in the study. (B) Overview of single sample workflow and description of samples and methods used in the study.

The dualGSEA tool uses the fgsea R package to perform pairwise analysis and the GSVA R package to perform ssGSEA as the single sample method of choice. We implemented these packages in dualGSEA as our study highlighted minimal differences between the choice of pairwise tool and minimal difference between the two single sample tools. Users can input their transcriptomic data, define their groups of interest and run the analysis with their chosen gene signatures.

dualGSEA provides comprehensive visual outputs for both pairwise and single sample results including bar plots, enrichment plots, histograms, ridge plots, ROC curves and waterfall plots (Figure 2). These visualisations allow the user to accurately interpret their data and derive meaningful insights into the different underlying biology in their groups of interest.

Figure 2. Schematic of dualGSEA. Inputs for dualGSEA include and expression data matrix of transcriptomic data, the sample labels and gene signatures. Pairwise outputs from dualGSEA include a ranked list of genes from differential expression analysis, bar plots of enrichment results and enrichment plots. Single sample outputs include ridge plots, histograms, waterfall plots and ROC curves.

Conclusions and Implications of this Study:

This study highlighted the importance of visualisation techniques to complement statistical results and validate biological relevance. We showed how a highly significant pairwise result does not always translate to a significant single sample result, with visualisation playing a key role in uncovering this inconsistency.

To address these challenges, we developed dualGSEA, an open-source R-based function that integrates pairwise and single sample analyses, providing comprehensive visual outputs to aid in the data interpretation. By combining both the statistical and biological insights, dualGSEA helps ensure meaningful conclusions are made from transcriptional data. This balanced approach not only reduces limitations from the using the individual methods but also supports researchers in uncovering nuanced biological insights and improving the strength of their findings.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Cancer Biology
Life Sciences > Biological Sciences > Cancer Biology
Bioinformatics
Life Sciences > Biological Sciences > Biological Techniques > Computational and Systems Biology > Bioinformatics
Biomedical Research
Life Sciences > Health Sciences > Biomedical Research

Related Collections

With collections, you can get published faster and increase your visibility.

Artificial intelligence and medical imaging

This collection seeks original research on AI in medical imaging, covering algorithm development, model building, performance, pathology, clinical application, and public health. Includes MRI, CT, ultrasound, PET, and SPECT.

Publishing Model: Open Access

Deadline: Aug 01, 2025

Reproductive Health

This Collection welcomes submissions related to a broad range of topics within reproductive health care and medicine related to reproductive well-being.

Publishing Model: Hybrid

Deadline: Sep 30, 2025