Accelerating Drug Discovery with Machine Learning for Targeted Protein Degraders

In our recent paper titled "Application of machine learning models for property prediction to targeted protein degraders", we demonstrate the suitability of machine learning-based quantitative structure-property relationship models for targeted protein degraders.
Accelerating Drug Discovery with Machine Learning for Targeted Protein Degraders
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

Introduction

Targeted protein degraders (TPDs) have emerged as a promising new modality in drug discovery. These molecules are able to selectively degrade disease-causing proteins and have the potential to tackle diseases that were previously considered "undruggable". However, the application of machine learning (ML) models for TPDs’ property predictions has been limited and questioned. In our recent paper titled "Application of machine learning models for property prediction to targeted protein degraders", we demonstrate the suitability of ML-based quantitative structure-property relationship (QSPR) models for predicting various properties of TPD molecules. Our work sheds light on the potential of ML in accelerating drug discovery for TPDs.

 

Comprehensive Evaluation of ML for TPDs

Our study evaluated ML models for the prediction of ADME (absorption, distribution, metabolism, and excretion) and physicochemical properties of TPDs, including both molecular glue and heterobifunctional submodalities. We developed and tested ML models using existing experimental data, accurately predicting relevant properties of TPDs, such as passive permeability, metabolic clearance, or lipophilicity. Surprisingly, the performance of these ML models on TPDs was comparable to that of other modalities, with the molecular glue submodality showing the lowest prediction errors.

ML model results are shown for fifteen ADME and physicochemical properties. Reported are the mean absolute error (MAE) values for glues (blue), heterobifunctionals (orange) and all the other compounds (green). Models are compared to a baseline prediction (gray), i.e. mean of the training set.
ML models’ performance on TPDs and other modalities. Model results are shown for fifteen ADME and physicochemical properties (described in the paper). Reported are the mean absolute error (MAE) values for glues (blue), heterobifunctionals (orange) and all the other compounds (green). Models are compared to a baseline prediction (gray), i.e. mean of the training set. 

  

Challenges and Refinements

Our results revealed that predictions for heterobifunctional TPDs remain more challenging. However, the implementation of transfer learning strategies, such as fine-tuning models with heterobifunctional TPDs’ data, improved predictive performance across different ADME endpoints. This highlights the possibility for further refinement and improvement as more data becomes available.

Performance of original and refined models (transfer learning) on heterobifunctional TPDs. Reported are mean absolute error (MAE) values for two fine-tuning strategies: (i) on new data (yellow) and (ii) only heterobifunctional data (purple), as well as the original (red) ML models. Shown are bootstrapping results (n = 1000) for heterobifunctional TPD compounds, and five ADME assays (described in the paper).

 

The Potential of Surrogate Datasets

Our study provides a surrogate dataset with over 270,000 structures, annotated with in-house model predictions for twenty-five  molecular properties. We showcased the potential of using ML-based QSPR models with surrogate data. Hence, this dataset offers exciting prospects for the advancement of ML models in the public domain.

Scheme of surrogate dataset generation. Public compound structures were extracted from ChEMBL, ZINC, and PROTAC-DB, and annotated with our in-house ML predictions. The surrogate dataset contains ~274,000 compounds with predicted data for twenty-five properties.

 

Implications for Pharmaceutical Research

The integration of ML models into the design-make-test-analyze (DMTA) cycle of drug discovery has already proven beneficial in prioritizing compound ideas and experiments. However, the use of ML models for TPDs has been relatively limited compared to traditional modalities. Our findings shed light on the applicability of ML to TPDs and have implications for pharmaceutical research. Specifically, ML models show promise of accelerating the design of TPDs with favorable ADME properties and thus should be encouraged in TPD programs.

 

Conclusion

The ability of ML models to predict ADME properties for TPDs, combined with the potential for improvement through transfer learning, could enable more efficient drug design and advance the field of TPDs in drug discovery. As the availability of data for TPDs continues to increase, including with surrogate datasets, additional modeling strategies can be explored, opening up opportunities to more accurately predict TPDs’ properties based on molecular structures.

  

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Machine Learning
Mathematics and Computing > Computer Science > Artificial Intelligence > Machine Learning
Drug Development
Physical Sciences > Chemistry > Biological Chemistry > Medicinal Chemistry > Drug Development
Medicinal Chemistry
Physical Sciences > Chemistry > Biological Chemistry > Medicinal Chemistry
Artificial Intelligence
Mathematics and Computing > Computer Science > Artificial Intelligence

Related Collections

With collections, you can get published faster and increase your visibility.

Biology of rare genetic disorders

This cross-journal Collection between Nature Communications, Communications Biology, npj Genomic Medicine and Scientific Reports brings together research articles that provide new insights into the biology of rare genetic disorders, also known as Mendelian or monogenic disorders.

Publishing Model: Open Access

Deadline: Oct 30, 2024

Cancer epigenetics

With this cross-journal Collection, the editors at Nature Communications, Communications Biology, Communications Medicine, and Scientific Reports invite submissions covering the breadth of research carried out in the field of cancer epigenetics. We will highlight studies aiming at the improvement of our understanding of the epigenetic mechanisms underlying cancer initiation, progression, response to therapy, metastasis and tumour plasticity as well as findings that have the potential to be translated into the clinic.

Publishing Model: Open Access

Deadline: Oct 31, 2024