Behind the Paper

Analysis of microarray and single-cell RNA-seq identifies gene co-expression, cell–cell communication, and tumor environment associated with metabolite interconversion enzyme in prostate cancer

Cancer does not have a universally agreed-upon description, but it is characterized by the unregulated proliferation of cells, which may result in the infiltration of the cell basement or the spread to other organs in the body.

Danial Hashemi Karoii Jun 03, 2025

3.1 DEGs screening

Using the edgeR software, we accurately identified the DEGs within the distinct subgroups of prostate cancer. The adjusted P-value (FDR) criterion was set to be less than 0.01, and a minimum fold change of 2 was needed. A total of 544 genes that showed differential expression were discovered by comparing data from four subtypes of prostate cancer in six distinct combinations (Fig. 1A, Table 2 and Supplementary 1).

Fig. 1

This study demonstrates the diversity in gene expression data between persons with PCa and healthy individuals. A The analysis includes four datasets: GSE55945, GSE29079, GSE104749 and GSE46602. B metabolite interconversion enzyme changes in PCa, C select metabolite interconversion enzyme from other genes that change and (D) Heatmap shows metabolite interconversion enzyme in PCa. In this depiction, genes that exhibit an increase in expression, with a fold change of more than 1.5 and a corrected P-value less than 0.05, are shown as red dots. Conversely, genes that show a decrease in expression and match the same criteria are represented as blue points. Genes exhibiting no significant variation in expression are indicated by black data points

Full size image

Table 2 CYP3A5, PDE8B, AOX1, BNIPL, FADS2, RRM2, ALDH3B2, and GSTM2 expression in PCa

Full size table

3.2 Protein class sorting and finding metabolite interconversion enzyme and their related genes

Differential gene expression was observed in both prostatic and normal cell groups. Transcript analysis using PANTHER revealed that the differentially expressed RNAs include a varied collection of gene sequences that are distributed throughout the metabolite interconversion enzyme. The sequences are associated with several molecular activities, including as hydrolase, isomerase, ligase, lyase, oxidoreductase, and transferase. Additionally, they engage in the biological process of biological adhesion (GO:0022610). The genes CYP3A5, PDE8B, AOX1, BNIPL, FADS2, RRM2, ALDH3B2, and GSTM2 have been identified as crucial genes that significantly influence the microenvironment of prostate cancer (Fig. 1B).

3.3 Gene co-expression analysis

Gene co-expression analysis was performed to identify genes associated with the selected targets, including CYP3A5, PDE8B, AOX1, BNIPL, FADS2, RRM2, ALDH3B2, and GSTM2. The analysis utilized two key tools: Coxpressdb and GeneMANIA. These tools were employed to construct gene co-expression networks and explore the relationships between the identified genes. A similarity matrix was created by calculating Pearson's correlation for all possible gene pairs. This matrix was then used to build a scale-free co-expression network, with an optimal soft threshold power (β) selected to ensure network stability. Following this, the similarity matrix was converted into a topological overlap matrix (TOM), which enabled further network analysis and identification of key gene associations.

3.4 Identification of key DEGs through PPI analysis

The master genes were identified by network analysis conducted with Cytoscape. Figure 1C used the yFiles radial arrangement to include the found genes. The network discerned the differentially expressed genes (DEGs) via the analysis of their connections. The picture clearly indicates that the 45 genes with the highest level of interaction have the most robust and dependable connections. CYP3A5, PDE8B, AOX1, BNIPL, FADS2, RRM2, ALDH3B2, and GSTM2 are pivotal regulators of several genes, exercising unique control over each. Notable gene associations were detected in several configurations of the Cytoscape network. The Cytoscape network analysis tool was used to verify the yFiles radial design. The Centiscape plugin has identified ALDH3B2, AOX1, ASPM, AURKA, AURKB, BIRC5, BUB1, CCNB1, CDC45, CDC6, CDK1, CDKN3, CDT1, CENPF, CENPM, CENPU, CKS2, CYP3A5, CYP4F2, DTL, E2F8, ECT2, EXO1, GINS1, GSTM2, GSTO2, HMMR, KIF18B, KIF4A, KIF9, MCM4, MELK, MYC, NCAPG, NUF2, NUSAP1, OIP5, PBK, PCLAF, PRC1, RACGAP1, RAVER2, RRM2, SHCBP1, SKA3, SPC25, TOP2A, TPX2, TRIP13, TYMS, UBE2C, UHRF1, and ZNF367 as the most significant co-expression genes, determined by their degree and betweenness centrality, as illustrated in Fig. 1C, D. The PPI network graph was examined visually using Cytoscape, utilizing data sourced from the STRING database. Utilizing the CytoHubba plug-in, we successfully identified genes exhibiting either up-regulation or down-regulation. As a result, we found a cohort of 50 unique genes inside the gene regulatory network. Fourteen hub target genes were identified as common, namely E2F8, ECT2, EXO1, GINS1, GSTM2, GSTO2, HMMR, KIF18B, KIF4A, KIF9, MCM4, MELK, MYC, NCAPG, and NUF2 (Fig. 2A, B).

Fig. 2

Using Enrich, we conduct an analysis of PPI. A PPI in metabolite interconversion enzyme and (B) PPI in PCa

Full size image

3.5 Gene co-expression analysis

Sample clustering revealed five outlier samples, which were excluded from further analysis. The clustering height threshold was set at 20 to ensure optimal grouping of samples. Using module membership (MM) > 0.9 and gene significance (GS) > 0.3 as selection criteria, 18 hub genes were identified from the modules containing CYP3A5, ALDH3B2, and GSTM2, along with their co-expression partners PDE8B, AOX1, BNIPL, FADS2, and RRM2 (Fig. 3).

Fig. 3

Gene–gene co-expression analysis and network. A scatter plot of Gene–gene co-expression analysis, B heatmap of DEG in gene–gene co-expression, and (C) gene–gene co-expression network

Full size image

3.6 The biological process of enrichment analysis and its molecular functions

During the investigation using the Enrich software, three GO keywords were associated with up-regulated DEGs, whereas three GO terms were associated with down-regulated DEGs. The functional enrichment analysis revealed that the small molecule metabolic process (GO:0044281), cellular catabolic process (GO:0044248), and oxidation–reduction process (GO:0055114) were significantly more active in the DEGs that were either upregulated or downregulated. This is shown in Fig. 4A, B. The differentially expressed genes (DEGs) that were up-regulated showed a higher representation of molecular function (MF) keywords, including oxidoreductase activity (GO:0016725), acting on CH or CH2 groups oxidoreductase activity (GO:0016491), iron ion binding (GO:0005506), and catalytic activity (GO:0003824) (Fig. 4C, D).

Fig. 4

Biological processes and molecular functions in the genes. A analysis of biological processes of PPI, B analysis of molecular functions of PPI, C analysis of biological processes of metabolite interconversion enzyme, and (D) analysis of molecular functions of metabolite interconversion enzyme

Full size image

3.7 KEGG pathway enrichment re-analysis for 10 hub genes

In order to investigate the potential signaling pathways associated with these 10 hub genes, we performed a re-analysis of the KEGG pathway using the DAVID tool (P < 0.05). The genes have a robust association with six signaling pathways: apoptotic signaling pathway, PI3K signaling pathway, EMT pathway, ER pathway, TSCmTOR pathway, and RASMAPK pathway activity. We analyzed the levels of expression of these four genes using the TCGA database. The results showed a significant increase in the levels of expression in both BC-adjacent and PCa patients compared to those in excellent health. In order to get a more thorough understanding of the underlying mechanism of these four genes in PCa, we conducted co-expression data mining using the pc-GenExMiner tool. Figure 5 provide compelling evidence of the up-regulation or down-regulation of all nine genes in PCa tissues, suggesting the presence of a signaling network.

Fig. 5

Identification of the signaling pathway linked to metabolite interconversion enzyme in the prostate cancer microenvironment. The signaling pathways involved in the prostate cancer environment include the apoptosis signaling pathway, PI3K signaling pathway, EMT pathway, ER pathway, TSCmTOR pathway, and RASMAPK pathway activity

Full size image

3.8 Construction of the co-expression network and identification of modules

The WGCNA methodology was used to identify modules that are associated with different subtypes of prostate cancer. A study was performed on the 544 genes that showed differential expression to ascertain their co-expression. We conducted an inquiry into the soft threshold power of the network architecture, using a variety of β values ranging from 1 to 20. Subsequently, we evaluated the extent to which the co-expression network remains unaltered by variations in size and quantified the mean amount of connection within the network. After evaluating the study findings, a criterion of 5 was chosen as the most appropriate (Fig. 6A). The node degree distribution was examined using a power-law distribution, suggesting that the network has a scale-free structure. Subsequently, the numerical value of β, which was precisely 5, was used to generate a gene tree by the method of hierarchical clustering. The degree of gene expression inside a module is indicated by ME, which serves as the main component for that particular module. There are two techniques for examining the relationship between each module and the appropriate subtypes of prostate cancer. Two modules were identified that had strong correlations with certain cancer subtypes (P < 0.01 and absolute correlation value > 0.75). The modules linked to the different forms of prostate cancer, namely the blue (R = 0.8; P < 1e-168) and turquoise (R = − 0.71; P = 1e-113) modules, were identified as statistically significant using the Spearman correlation approach in the examination of relationships between modules and traits. Figures 6B, C illustrate the genetic importance of the interactions between the turquoise and blue modules. In addition, our examination of single-cell data revealed nine crucial genes—CYP3A5, PDE8B, AOX1, BNIPL, FADS2, RRM2, ALDH3B2, and GSTM2—that have a substantial connection with PCa.

Fig. 6

Identify and sort a hub gene module that exhibits a correlation in PCa. A Identifying and correlation between PPI, WGCNA, and GO enrichment and branches of the cluster dendrogram of the most connected genes gave rise to 6 gene co-expression modules, B Heat map of the correlation between module eigengenes and phenotype, and (C) Intergenic connectivity of Sertoli cells’ genes in the turquoise module

Full size image

3.9 Classification of cell types based on scRNA-seq data of tumor models

In order to ascertain cell type-specific intercellular communication, we first determined the cell type of each individual cell. Due to constraints in scRNA-seq technique, such as suboptimal mRNA capture efficiency, the acquired data may include genes that were not discovered. This occurrence is often referred to as "zero dropout" and renders the identification of cell types based on individual marker genes impractical for all cells in the sample. Consequently, we improved a previously established method of supervised classification for assigning cell types. Initially, we manually established a roster of cell kinds to be investigated in the dataset. Subsequently, we identified certain genes, referred to as marker genes, that serve as indicators for each cell type. These marker genes are included, which can be found in the section titled "Determining Gene Markers for Syngeneic Tumor Models". In order to determine whether each cell is positive or negative for each marker gene, we used Gaussian mixture models to analyze the expression levels of each marker gene. Subsequently, we allocated each cell in the dataset to one of the mixture components. We conducted experiments using Gaussian mixture models with one to five components to account for the potential presence of many modes in gene expression (Fitting Gaussian Mixture Models to Determine Marker Expression). Except for one example (Rpl29), gene expression profiles were best suited by mixture models consisting of two components, as determined by the Bayesian information criterion (BIC) used for model selection (Fig. 7).

Fig. 7

Interaction scores correlate with relevant characteristics of the tumor microenvironment. Cell–cell interactions involving Tregs in human metastatic averaged across PCa. The cell type labels are written as (A) cell type expressing the ligand, (B) cell type expressing the receptor, and (C) interaction score correlations. Heatmap showing the Spearman correlation of interaction scores with tumor growth. Interactions marked with black circles indicate correlations with p < 0.01 in (D) receptor, (E) ligand only, and (F) interaction score correlations

Full size image

After classifying cells as positive or negative for each marker gene, we created a training dataset of high-confidence cells that displayed the necessary marker gene profiles for a certain cell type. We effectively differentiated tumor cells from several syngeneic models, including 171 B16-F10 cells, 3,345 CT26 cells, 433 EMT6 cells, 472 LL2 cells, 780 MC-38 cells, and 23 Sa1N cells. Furthermore, we discovered immunological and stromal cells, including 13 B cells, 62 cancer-associated fibroblasts (CAFs), 21 endothelial cells, 495 macrophages, 23 natural killer (NK) cells, 142 T cells, and 55 dendritic cells (DCs). The training dataset included around 66% (6,035 out of 9,232) of all cells, including samples from all syngeneic models: 220 from B16-F10 cells, 3,497 from CT26 cells, 491 from EMT6 cells, 512 from LL2 cells, 867 from MC-38 cells, and 448 from Sa1N cells. The training dataset was conservative since it excluded cells affected by zero dropout. Subsequently, we used the "high-confidence" dataset to train a supervised decision tree classifier that leveraged the whole gene expression data to predict the cell type of all remaining cells (Fig. 7A–C). In order to mitigate overfitting, we selected just the 500 genes with the highest variability in the dataset. Subsequently, we conducted principal-component analysis to further decrease the dimensionality of our input data. For the classifier training process, we selected only the main components that accounted for 95% of the variation in gene expression as input characteristics. By using this approach, the classification became more resilient to the absence of data and the presence of noisy data. We assessed the precision of our classifier by using fivefold cross-validation on the training dataset.

Utilizing the classifier that has been trained, we proceeded to forecast the cell type designations for every cell that exists in the dataset. We calculated the probabilities for each given label and only kept cells that had a cell type label assigned with a probability greater than 95%. Typically, around 6% of cells in each model were not categorized with a cell type that had a probability greater than 95%. These unassigned cells might potentially be either doublets or cells that belong to a cell type that was not indicated by the initial set of markers. Consistent with findings from studies analyzing single-cell RNA sequencing data in humans (Puram et al., 2017; Tirosh et al., 2016), the individual cells from murine syngeneic models formed distinct clusters based on the model for the cancerous cells and based on the cell type for the non-cancerous cells, except for macrophages. Considering the fact that macrophages may change their characteristics depending on the specific tissue they are in (Biswas and Mantovani, 2010), it is plausible that the grouping of macrophages according to tumor type is a result of the unique microenvironments seen in each tumor.

3.10 Scoring cell–cell interactions using known ligand–receptor interactions

After establishing the different kinds of cells, we proceeded to measure the possible connections between all the cell types found in the tumor microenvironment. We utilized a compilation of around 1,800 established and scientifically supported interactions. These interactions encompass receptor-ligand interactions from various families such as chemokine, cytokine, receptor tyrosine kinase (RTK), and tumor necrosis factor (TNF), as well as extracellular matrix (ECM)-integrin interactions. Furthermore, we included known B7 family member interactions in a manual manner due to their significance in the field of cancer immunology (Fig. 7D–F).