Chronic Lymphocytic Leukemia (CLL) is an indolent B cell hematological malignancy predominantly affecting older adults, with several known clinical risk factors and genetic events that affect disease progression rates and therapeutic response. New therapeutics introduced over the last decade have improved both the rate and the depth of responses, but resistance still occurs, and new biomarkers are needed to guide when to initiate therapy, and to identify new therapeutic targets. Most FDA approved drugs for CLL target proteins, so it is imperative to characterize disease biology driven by both total and post translationally modified (PTM) proteins for novel target discovery and treatment paradigm optimization. Prior studies of proteomics in CLL have studied a small number of cases, insufficient to cover the true heterogeneity of CLL biology. Furthermore, small sample size makes it difficult to observe how CLL biology is influenced by a combination of factors. In our study, we overcame past barriers by dissecting the proteomics of a diverse cohort of 795 CLL patients with Reverse Phase Protein Array (RPPA).
RPPA is a sensitive, antibody-based protein quantification method that uses minute amounts of patient sample to obtain information. In RPPA batches of slides (we have printed up to 600 in a run) have large numbers of different samples (up to ~1100) printed on them and are then probed individually with highly validated antibodies, the data is then digitized for analysis. In comparison to mass spectrometry, where any protein can theoretically be detected, RPPA is limited by the ability to validate an antibody for a selected target and by the number of slides that can be printed but has the advantage that a very large number of samples can be analyzed simultaneously. To compensate for this, we selected 384 total and PTM antibodies relevant to general cancer biology and specific to CLL.
In our previous and current manuscripts, we have shown that RPPA information is promising for the identification of novel drug targets for several types of leukemia(1-6). To make sense of our proteomics information, we assumed that the data was as heterogenous as the stars in the sky. Therefore, we designed a method for finding repetitive patterns (constellations) among the protein (stars). The entire proteomic cosmos comprises the “Metagalaxy”, and this analysis utilizes a combination of several unbiased hierarchal clustering algorithms where we do not use prior knowledge/risk factors to group patients prior to analysis. This approach consists of first characterizing protein functional expression patterns followed by building systems-wide signatures based on similarities in co-occurring patterns (constellations). Lastly, we complete several types of statistical analyses to discern the overall relevance of individual proteins, protein functional groups, and signatures for clinical outcomes and therapy responses. The major advantage of applying the Metagalaxy approach is the multitude of new biological information acquired from viewing the data in an unbiased manner. Similar to how the significant advancements in the space race required thinking beyond the math that is known (“Go No Go strategy”) in the Hidden Figures film, we propose that exploring biology without prior assumptions could advance biological and therapeutic discoveries for CLL.
When we applied Metagalaxy to the CLL patient data, we verified that proteomics data is both biologically and clinically informative. We first compared CLL cells to their normal CD19 B counterparts. The CLL proteome consisted of proteins with expression below (16%), above (2%), and like (82%) that of normal CD19+ B cells. When looking at proteins grouped by function (40 groups), we characterized a total of 150 unique protein functional group (PFG) expression patterns, and the majority of those (71%) were leukemia specific. From hierarchal clustering of the PFG expression pattern biology, we classified CLL patients into signatures (groups of patients with similar PFG patterns) From PFG information, we were able discern which proteins and protein functions are being utilized differently in CLL cells compared to normal and categorize CLL patients into six biological subtypes. Notably, one of the signatures consisted of a rare group (5% of CLL patients) whose proteomics looked like Hairy Cell Leukemia.
From investigating whether proteomics could be clinically relevant at three levels (individual proteins, when in functional groups, and when in signatures), we observed that proteomic information is strikingly prognostic. Cox hazard analysis of all proteins revealed that 34% (130/384) of them were associated with survival outcomes in CLL. While assessing PFGs via Kapan Meier analysis, we found that majority (80%) of PFGs were predictive of survival. Lastly, when assessing outcomes by systems wide signature groups, we not only observed that signatures were predictive of all outcomes but were independently predictive of survival outcomes. Remarkably, two signature groups (A and C) had poorer outcomes compared to the other signatures (with the others possibly more representative of indolent CLL). The time to first treatment and next treatment was also heterogeneous between the PFG and SG, allowing for better decisions on which patients to treat early vs. adopting a watch and wait strategy. Furthermore, we assessed whether signatures could be informative in addition to current CLL risk classifiers such as Rai staging, IGHV status, and CLL-IPI classification. For survival, within a given Rai stage, IPI group, or by IGHV mutation status, we surprisingly found that signature groups prognosticated for survival outcomes within each, but the opposite did not occur. The same was found for TTFT, for Rai and CLL-IPI, while IGHV and SG were complimentary to each other.
Taken together, our results demonstrate the potential clinical application of proteomics to optimize CLL diagnostics, watch and wait strategies, and therapeutics. As this is the first study of its kind for CLL, further experimental validation is necessary for biological/therapeutic findings. We are open to potential collaborations, answering any questions, or assisting with analyses regarding the dataset or findings. Our figures and dataset will be a publicly available resource for anyone to use (https://www.leukemiaatlas.org).
- Hoff FW, Hu CW, Qiu Y, Ligeralde A, Yoo SY, Mahmud H, et al. Recognition of Recurrent Protein Expression Patterns in Pediatric Acute Myeloid Leukemia Identified New Therapeutic Targets. Mol Cancer Res. 2018;16(8):1275-86.
- Hoff FW, Hu CW, Qiu Y, Ligeralde A, Yoo SY, Scheurer ME, et al. Recurrent Patterns of Protein Expression Signatures in Pediatric Acute Lymphoblastic Leukemia: Recognition and Therapeutic Guidance. Mol Cancer Res. 2018;16(8):1263-74.
- Hoff FW, Hu CW, Qutub AA, Qiu Y, Hornbaker MJ, Bueso-Ramos C, et al. Proteomic Profiling of Acute Promyelocytic Leukemia Identifies Two Protein Signatures Associated with Relapse. Proteomics Clin Appl. 2019;13(4):e1800133.
- van Dijk AD, Griffen TL, Qiu YH, Hoff FW, Toro E, Ruiz K, et al. RPPA-based proteomics recognizes distinct epigenetic signatures in chronic lymphocytic leukemia with clinical consequences. Leukemia. 2021.
- Hu CW, Qiu Y, Ligeralde A, Raybon AY, Yoo SY, Coombes KR, et al. A quantitative analysis of heterogeneities and hallmarks in acute myelogenous leukaemia. Nat Biomed Eng. 2019;3(11):889-901.
- Hoff FW, Van Dijk AD, Qiu Y, Hu CW, Ries RE, Ligeralde A, et al. Clinical relevance of proteomic profiling in de novo pediatric acute myeloid leukemia: a Children's Oncology Group study. Haematologica. 2022.