Behind the Paper

Accelerating Drug Discovery with Machine Learning for Targeted Protein Degraders

In our recent paper titled "Application of machine learning models for property prediction to targeted protein degraders", we demonstrate the suitability of machine learning-based quantitative structure-property relationship models for targeted protein degraders.

Published in Chemistry and Computational Sciences

Sep 11, 2024

Raquel Rodríguez-Pérez

AD & Senior Principal Data Science, Novartis

Accelerating Drug Discovery with Machine Learning for Targeted Protein Degraders

Liked by India Ambler

Explore the Research

Introduction

Targeted protein degraders (TPDs) have emerged as a promising new modality in drug discovery. These molecules are able to selectively degrade disease-causing proteins and have the potential to tackle diseases that were previously considered "undruggable". However, the application of machine learning (ML) models for TPDs’ property predictions has been limited and questioned. In our recent paper titled "Application of machine learning models for property prediction to targeted protein degraders", we demonstrate the suitability of ML-based quantitative structure-property relationship (QSPR) models for predicting various properties of TPD molecules. Our work sheds light on the potential of ML in accelerating drug discovery for TPDs.

Comprehensive Evaluation of ML for TPDs

Our study evaluated ML models for the prediction of ADME (absorption, distribution, metabolism, and excretion) and physicochemical properties of TPDs, including both molecular glue and heterobifunctional submodalities. We developed and tested ML models using existing experimental data, accurately predicting relevant properties of TPDs, such as passive permeability, metabolic clearance, or lipophilicity. Surprisingly, the performance of these ML models on TPDs was comparable to that of other modalities, with the molecular glue submodality showing the lowest prediction errors.

ML model results are shown for fifteen ADME and physicochemical properties. Reported are the mean absolute error (MAE) values for glues (blue), heterobifunctionals (orange) and all the other compounds (green). Models are compared to a baseline prediction (gray), i.e. mean of the training set. — **ML models’ performance on TPDs and other modalities.** Model results are shown for fifteen ADME and physicochemical properties (described in the paper). Reported are the mean absolute error (MAE) values for glues (blue), heterobifunctionals (orange) and all the other compounds (green). Models are compared to a baseline prediction (gray), i.e. mean of the training set.

Challenges and Refinements

Our results revealed that predictions for heterobifunctional TPDs remain more challenging. However, the implementation of transfer learning strategies, such as fine-tuning models with heterobifunctional TPDs’ data, improved predictive performance across different ADME endpoints. This highlights the possibility for further refinement and improvement as more data becomes available.

**Performance of original and refined models (transfer learning) on heterobifunctional TPDs.** Reported are mean absolute error (MAE) values for two fine-tuning strategies: (i) on new data (yellow) and (ii) only heterobifunctional data (purple), as well as the original (red) ML models. Shown are bootstrapping results (n = 1000) for heterobifunctional TPD compounds, and five ADME assays (described in the paper).

The Potential of Surrogate Datasets

Our study provides a surrogate dataset with over 270,000 structures, annotated with in-house model predictions for twenty-five molecular properties. We showcased the potential of using ML-based QSPR models with surrogate data. Hence, this dataset offers exciting prospects for the advancement of ML models in the public domain.

Scheme of surrogate data set generation — **Scheme of surrogate dataset generation.** Public compound structures were extracted from ChEMBL, ZINC, and PROTAC-DB, and annotated with our in-house ML predictions. The surrogate dataset contains ~274,000 compounds with predicted data for twenty-five properties.

Implications for Pharmaceutical Research

The integration of ML models into the design-make-test-analyze (DMTA) cycle of drug discovery has already proven beneficial in prioritizing compound ideas and experiments. However, the use of ML models for TPDs has been relatively limited compared to traditional modalities. Our findings shed light on the applicability of ML to TPDs and have implications for pharmaceutical research. Specifically, ML models show promise of accelerating the design of TPDs with favorable ADME properties and thus should be encouraged in TPD programs.

Conclusion

The ability of ML models to predict ADME properties for TPDs, combined with the potential for improvement through transfer learning, could enable more efficient drug design and advance the field of TPDs in drug discovery. As the availability of data for TPDs continues to increase, including with surrogate datasets, additional modeling strategies can be explored, opening up opportunities to more accurately predict TPDs’ properties based on molecular structures.

Raquel Rodríguez-Pérez

AD & Senior Principal Data Science, Novartis

Dr. Raquel Rodríguez-Pérez is Associate Director & Senior Principal Data Scientist at Novartis Biomedical Research. In her current role, she integrates AI technologies into decision-making in drug discovery projects. She enables AI-driven drug design by leveraging techniques such as molecular property predictions, generative chemistry, uncertainty estimation or explainable machine learning. Her research has focused on predictive modeling and pattern recognition for different applications in chemistry and life sciences.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Machine Learning

Mathematics and Computing > Computer Science > Artificial Intelligence > Machine Learning

Drug Development

Physical Sciences > Chemistry > Biological Chemistry > Medicinal Chemistry > Drug Development

Medicinal Chemistry

Physical Sciences > Chemistry > Biological Chemistry > Medicinal Chemistry

Artificial Intelligence

Mathematics and Computing > Computer Science > Artificial Intelligence

Nature Communications

Nature Communications

An open access, multidisciplinary journal dedicated to publishing high-quality research in all areas of the biological, health, physical, chemical and Earth sciences.

More about the journal

Related Collections

With Collections, you can get published faster and increase your visibility.

Women's Health

A selection of recent articles that highlight issues relevant to the treatment of neurological and psychiatric disorders in women.

Publishing Model: Hybrid

Deadline: Ongoing

Explore this Collection

Advances in neurodegenerative diseases

This Collection aims to bring together research from various domains related to neurodegenerative conditions, encompassing novel insights into disease pathophysiology, diagnostics, therapeutic developments, and care strategies. We welcome the submission of all papers relevant to advances in neurodegenerative disease.

Publishing Model: Hybrid

Deadline: Mar 24, 2026

Explore this Collection

Latest Content

Comprehensive risk profiling of occupational harmful factors in the ceramic industry: a case study from Iran

How to select the best candidate or the key factors? Hierarchical topological clustering can help

REM-related obstructive sleep apnoea in neuromuscular diseases: A 10-year retrospective cohort study

Advanced Remediation of Toxic Materials Using Zero-Valent Iron Nanoparticles: A Comprehensive Review"

Invasive bacteriophages between a bell and a hammer

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Accelerating Drug Discovery with Machine Learning for Targeted Protein Degraders

Share this post

Share with...

...or copy the link