Precision medicine requires the disease-specific molecular classification method that accurately reflects disease process and clinical behavior1-2. At the same time, massive data of multidimensional molecules, including nucleic acid (DNA/RNA), protein, and small molecules, have become a consistent research trend 3-4. However, systematic normalization and extensive computation-intensive data filtering are indispensable to enable effective multidimensional data integration for disease-specific molecular classification 5-6.
Advances in developing DNA reaction and in silico classifiers provide a powerful generalizable means of molecular classification7-8. However, it is mainly related to nucleic acid species and still difficult to extend to the dimensions of proteins or metabolic small molecules due to the heterogenous nature of these binding processes. A remaining challenge to realize DNA-based multidimensional molecular classifiers is to design a signal reporter that can translate the multidimensional molecular information into unified output signal in a programmable manner.
The highly programable nature of Watson-Crick base pairing of DNA delivers a spectrum of valence-controlled programmable atom-like nanostructures (PANs) for colloidal assembly with different composition, size, chirality, and linearity9-10. In particular, DNA tetrahedral frameworks (DTFs) provides a simple means to fabricate three-dimensional PANs with ordered structure and versatile modification11-12. Here we developed a PAN-based molecular classifier that can physically implement computational classification of multidimensional molecular clinical data. The atom-like programmable nature of DTFs supports the design of valence-controlled PAN signal reporters, resulting in the linearity in translating virtually any class of molecular binding to unified electrochemical sensor signals beyond the nucleic acid species. We demonstrate the implementation of the PAN-based molecular classifier to perform biomarker panel screening and analyze a panel of six biomarkers across three dimensional datatypes for near-deterministic molecular taxonomy for prostate cancer (PCa) patients. Moreover, we further developed a diagnosis panel screening system using PAN reporters for Gleason score-related classification.
Why is this important?
One is the requirement of precision medicine, while the other is magnanimity of data from different dimensionality. The developed valence-encoded PAN signal reporter realizes the precise molecular classification of PCa by exploiting DNA frameworks, as validated by translating multidimensional molecules (RNA, protein, and metabolic small molecules) into unified electrochemical signals with smart molecular weighting design. Using the PAN reporters, virtually any multidimensional in silico classifier models can be translated into their molecular counterpart classifiers. We note that the DTFs represent a type of three-dimensional PANs with simple design and atom-like valency, which could be programmably modified with signal moieties on its vertices and therefore used as signal reporter to realize the linear programming of signal gain. Moreover, DTF dimer and even multimers provide the possibility to expand the copy number of signaling moieties on PAN reporters from one to six and even bigger.
To achieve the suitable sensitivity for the detection in biological samples, the existing molecular classifiers usually use nucleic acid amplification method including PCR or DNA catalytic reaction7-8, which may bias the original abundance relationships of RNAs because RNAs with various lengths and composition could be amplified with differing efficiencies13. Our multidimensional molecular classifier was designed to minimize such bias. In our design, we translated molecular binding events into the recruitment of the PAN reporters with precise number of signal moieties (HRP enzymes), enabling to linearly program the signal gain for each molecule and preserve the abundance relationship between various molecules. Moreover, the number of weights could be expanded by generating DTF-trimers, tetramers, and even oligomers, which avoiding the emerging challenging problem of crosstalk between DNA probe reporters when we scale up the previously reported multi-gene and microRNA classifiers7.
Compared with existing methods for gene expression analysis and mass spectrophotometry-based metabolomic profiling, our classifier integrated with electrochemical sensing signal is general and rapid. The cost for single test is only $ 6.3 and our PAN reporter is simple to be prepared and can be successfully synthesized even by undergraduate students without any knowledge in this field. Because of the simplicity and generality in experimental operation, the overall workflow of our method was ~2 hours without requirement of data interpretation. The electrochemical chip showed great potential in high throughput measurement, which ensures the cost-effective and miniaturized instrument. In contrast, the quantitative reverse transcription PCR allows for multiplexed gene expression analysis but requires expensive reagents and instruments with complex procedure14. The metabolomic profiling for metabolic small molecules requires expensive and large mass spectrophotometry instrument15. As a comparison, our PAN reporters can be prepared in bulk with high stability and the products from single preparation could be used for thousands or even tens of thousands of experiments. Therefore, the system has the potential for practical applications.
Precise differentiation of healthy individuals and patients is an important clinical event and our multidimensional molecular classifier realized precise PCa diagnosis with an AUC of 100%, as validated with six biomarkers across three dimensional datatypes, offering unprecedented potential for processing multidimensional molecular clinical data than the other diagnostic methods. Comparison with exemplary technologies including PCR, ELISA, and microelectronic chip.
The molecular classifiers provide a ready-to-use tool for biomarker panel screening as validated with PCa diagnosis as a model, although this proof-of-concept test remains to be challenged with prospective testing in more realistic settings. One may also keep in mind that the selection of high-affinity aptamers for small molecules, in particular those can work well in complicated biological matrices, remains difficult. Given the ever-increasing molecular information from gene, RNA, protein, and metabolomic profiling of diseases, our multidimensional molecular classifiers for analyzing multidimensional molecular biomarkers shed new light on precision diagnosis and therapy.
1 Collins, F. S. & Varmus, H. A New Initiative on Precision Medicine. N. Engl. J. Med. 372, 793-795 (2015).
2 Thomasian, N. M., Kamel, I. R. & Bai, H. X. Machine intelligence in non-invasive endocrine cancer diagnostics. Nat. Rev. Endocrinol. 18, 81-95 (2022).
3 Krzywinski, M. & Savig, E. Multidimensional data. Nat. Methods 10, 595 (2013).
4 Luo, Y. et al. A multidimensional precision medicine approach identifies an autism subtype characterized by dyslipidemia. Nat. Med. 26, 1375-1379 (2020).
5 Tarazona, S. et al. Harmonization of quality metrics and power calculation in multi-omic studies. Nat. Commun. 11, 3092 (2020).
6 Lopez de Maturana, E. et al. Challenges in the Integration of Omics and Non-Omics Data. Genes 10, 238 (2019).
7 Lopez, R., Wang, R. & Seelig, G. A molecular multi-gene classifier for disease diagnostics. Nat. Chem. 10, 746-754 (2018).
8 Zhang, C. et al. Cancer diagnosis with DNA molecular computation. Nat. Nanotechnol. 15, 709-715 (2020).
9 Yao, G. et al. Meta-DNA structures. Nat. Chem. 12, 1067-1075 (2020).
10 Yao, G. et al. Programming nanoparticle valence bonds with single-stranded DNA encoders. Nat. Mater. 19, 781–788 (2020).
11 Li, J. et al. Encoding quantized fluorescence states with fractal DNA frameworks. Nat. Commun. 11, 2185 (2020).
12 Wiraja, C. et al. Framework nucleic acids as programmable carrier for transdermal drug delivery. Nat. Commun. 10, 1147 (2019).
13 Iscove, N. N. et al. Representation is faithfully preserved in global cDNA amplified exponentially from sub-picogram quantities of mRNA. Nat. Biotechnol. 20, 940-943 (2002).
14 Xie, N. G. et al. Designing highly multiplex PCR primer sets with Simulated Annealing Design using Dimer Likelihood Estimation (SADDLE). Nat. Commun. 13, 1881 (2022).
15 Sreekumar, A. et al. Metabolomic profiles delineate potential role for sarcosine in prostate cancer progression. Nature 457, 910-914 (2009)