Most human genes are found to co-exist with the word “cancer”. In fact, a Forum in Trends in Genetics by de Magalhães, investigated the (huge) cancer-related literature for association between cancer and human protein-coding genes and concluded that most genes can be justified (87.7%) as “cancer-associated” based on the existing literature. As alarming as it sounds, this is not necessarily surprising since navigating the overwhelming volume of cancer research is a taxing challenge for the whole scientific community.
Regardless of the large body of cancer research representing a confounding factor, and the intrinsic fuzziness of the term ‘association’ stemming from the basic methodological pillar that ‘correlation does not imply causation’, defining cancer genes remains challenging. Firstly, there exists lack of literature consensus on what makes a true cancer gene, with various sources citing slightly differing definitions. This propagates to gene databases and compendiums, where the number of cancer genes or oncogenes, ranges from hundreds like in the Cancer Gene Census (CGC, 729 genes), to thousands as is the case for the Atlas of Genetics and Cytogenetics in Oncology and Hematology (AGCOH, 1580-27K). The number discrepancies in cancer genes does not arise from definition differences alone, but also from compiling methodologies as well, where some databases such as the CGC are fully manually curated, whereas others rely on natural language processing algorithms. Moreover, despite the large number of genes included in some of these databases, some gene types are completely missing such as non-coding (nc) genes. Although some of these had been proven to play crucial roles in cancer regulation, they received little attention compared to the protein-coding genes as they remained understudied and largely uncharacterized.
Another challenge in the field stems from experimental set-up. Finding “causative” and “effectual” genes in cancer is an experimentally challenging task. The distinction between genes that are causally related to cancer and are potential drug targets, versus genes that exhibit differential behaviors in response to cancer is not clear, and unfortunately most experimental designs are unable to address this. Furthermore, the highly heterogeneous nature of tumors is another crucial feature that must be considered. Since cancer tissue is composed of not only transformed normal cells, but also of generally healthy cells that have been employed to sustain the tumor microenvironment (e.g., stroma cells), the behavior of cancer genes is likely to be just as heterogeneous. Bulk analyses based upon primary cultures or biopsies (majority of studies) fall short of capturing this and average expression data coming from heterogeneous subpopulations of cells. Thus, single-cell technologies could be more suitable in the study of cancer genes.
Addressing all of these issues represents a significant journey ahead of us, however, in our recent work “Globally Invariant behavior of oncogenes and random genes at population but not at single cell level”, we try to explore some of the problems discussed so far. We made use of the CGC database that we considered to be the most stringent in its selection of cancer genes due to its manual curation and investigated the behavior of cancer genes on bulk transcriptomic and proteomic scales (figure panel A).
We noticed that in paired normal and tumor patient tissues, CGC cancer genes behaved like any other identically sized subset of genes. These results could be made sense of in the context of biological networks which are known to follow scale-free or power-law organizations with a limited number of hubs (highly connected points), and many nodes (few to single connections). A hub could imply an important gene, and any removal, mutation or drastic epigenetic change affecting a hub gene would exert a significant effect on the nearest nodes, and even a global effect on the farthest nodes. These would be the expected properties of true cancer genes, with the potential to propagate changes throughout a whole system. However, since not all CGC genes are true cancer genes for every single investigated cancer type, it is unreasonable to expect all to have the hub like behavior we presume for true oncogenes. Instead, a specialized subset of genes could be the hubs in our gene regulatory networks.
Based on this idea, we made use of the descriptor variables provided by the database to subset specialized cancer genes (~ 20 cancer specific oncogenes per cancer type, which we referred to as “CSO” in our study) to test whether the limited subset of oncogenes would behave closer to our assumption of “hub genes”. Unexpectedly, those genes still showed averaged invariant behavior in the studies and caner types we examined. Despite our initial observations, in the context of biological networks at least, cancer genes do appear to be special: on average there are more hubs, the genes are better connected globally and amongst cancer genes, and they do not follow the same power law distribution that all genes do. Even when considering the bias in favor of cancer genes that tend to be more studied and thus have more information about their network connectivity, our findings do set oncogenes apart from the rest of the human genes.
So far when examining cancer genes, we had not considered one of the issues we had initially defined for cancer studies which is tumor heterogeneity. When examining single cell datasets, we once again observed a generally invariant behavior for the whole set of cancer genes. However, this time, around 10% of cancer genes emerged as variant. Furthermore, the variant genes within biological networks also showed significantly higher connectivity properties than any other differentially expressed random genes. Thus, studying cancer at a single-cell scale could offer some clarity in tackling the issue of cancer genes.
In summary, cancer research is still facing huge challenges; even the identification of ‘hub’ genes, such as key oncogenes, are not able to provide any long-term successful anti-cancer treatment strategies so far. It is timely to better focus on the systemic understanding of how these key genes are regulated spatially and temporally with multi-dimensional and cross-disciplinary research. At the same time, the network structure implies the emerging of collective ordered phenomena at a different scale with respect to the single genes, thus pushing the cancer research toward the exploration of explanation layers different from the sole molecular ones.
- de Magalhães, J. P. Every gene can (and possibly will) be associated with cancer. Trends Genet. 38, 216–217 (2022).
- Marusyk, A., Almendro, V. & Polyak, K. Intra-tumour heterogeneity: a looking glass for cancer? Rev. Cancer 2012 125 12, 323–334 (2012).
- Derbal, Y. Perspective on the dynamics of cancer. Biol. Med. Model. 14, (2017).
- Pucci, C., Martinelli, C. & Ciofani, G. Innovative approaches for cancer treatment: current perspectives and new challenges. Ecancermedicalscience 13, (2019).
- Hassanpour, S. H. & Dehghani, M. Review of cancer from perspective of molecular. Cancer Res. Pract. 4, 127–129 (2017).
- Klaunig, J. E. & Kamendulis, L. M. Carcinogenicity. Toxicol. Second Ed. 3, 117–138 (2010).
- Forbes, S. A. et al. COSMIC: Somatic cancer genetics at high-resolution. Nucleic Acids Res. 45, D777–D783 (2017)