Behind the Paper

Fast and effective molecular property prediction with transferability map

Transfer learning improves molecular property prediction in limited datasets, yet suffers from negative transfer due to insufficient relatedness. We develop a principal gradient-based measurement to evaluate transferability before applying transfer learning, significantly improving the performance.

Published in Chemistry

Apr 19, 2024

Shaolun Yao

Ph.D. Student, Zhejiang University

Liked by India Ambler

Explore the Research

Challenge

Molecular property prediction has been widely considered as one of the most critical tasks in computational drug and materials discovery, as many methods rely on predicted molecular properties to evaluate, select and generate molecules. With the development of artificial intelligence (AI), effective transfer learning for molecular property prediction exhibits a great advantage in addressing insufficient labeled molecules. However, many existing methods are still challenged by their inability to effectively account for the quantitative relationship between source and target properties, risking negative transfer, or necessitate intensive training on target tasks. Thus, a fast and effective method for quantifying the suitability of the source property for the target property prior to training on the target task is required.

Approach

To address the challenge of transfer learning for molecular property prediction, we propose a simple, fast, and effective Principal Gradient-based Measurement (PGM) to quantify the transferability from the source property to the target property (Figure 1). First, inspired by the predictive role of gradients in capturing intrinsic task-related characteristics for model optimization, we design a restart scheme to calculate a principal gradient in an optimization-free manner. The distance between the principal gradient obtained from model training on the source dataset and that derived from the target dataset indicates transferability. Second, we build a quantitative transferability map by performing PGM on various molecular property prediction datasets to show the inter-property correlations in property space distribution. The map is extensible and can be a reference standard for transfer learning in molecular property prediction, even when applied to a few target samples. Third, through the map, we can capture and transfer the most desirable source dataset for the given target dataset, so as to promote performance on the target task and avoid negative transfer.

Figure 1. Illustration of Principal Gradient-based Measurement (PGM) for guiding transfer learning in molecular property prediction.

Results

We evaluate PGM thoroughly on 12 benchmark datasets from MoleculeNet with various molecular property prediction tasks. We build a quantitative transferability map to intuitively observe the task-relatedness between these molecular property prediction datasets. Then we perform a transferability map-guided cross-task transfer learning strategy. Specifically, each of the 12 datasets is used as the target dataset, while the remaining 11 datasets are employed as source datasets, as described below. Initially, the model is trained on each source dataset to obtain pre-trained models. Subsequently, each of these pre-trained models is fine-tuned on the target dataset. As depicted in Figure 2, a significant correlation between the predicted transferability and the transfer learning performance across various tasks can be observed.

Figure 2. Comparison of the PGM distance and the transfer performance on the 12 target datasets.

Furthermore, we expand the transferability map's application from the above properties to subtasks within them. We also investigate the effectiveness of each module in PGM by conducting ablation studies focusing on three key areas: the computation efficiency of PGM, its performance relative to the size of the target dataset, and its behavior in relation to varying dataset sizes across different tasks. As resulted, the proposed approach can serve as fast and effective guidance to enhance the transfer performance of molecular property prediction.

Highlights

We propose a method to support transferability quantification for molecular property prediction datasets. Specifically, we design a principal gradient to approximate model optimization, which performs on source and target datasets to realize transferability measure between datasets. Furthermore, we build a transferability map based on PGM to access task-relatedness prior to applying transfer learning. Both theoretical and empirical studies demonstrate that PGM strongly correlates with the transfer performance of molecular property prediction, making it a quantified transferability measure for source dataset selection. This work contributes to more efficient discovery of drugs, materials, and catalysts by offering a task-relatedness quantification prior to transfer learning and understanding the relationship between chemical properties.

For more detail on the experiments and results, please read our paper:

https://www.nature.com/articles/s42004-024-01169-4

Shaolun Yao (She/Her)

Ph.D. Student, Zhejiang University

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Drug Development

Physical Sciences > Chemistry > Biological Chemistry > Medicinal Chemistry > Drug Development

Communications Chemistry

Communications Chemistry

An open access journal from Nature Portfolio publishing high-quality research, reviews and commentary in all areas of the chemical sciences.

More about the journal

Related Collections

With Collections, you can get published faster and increase your visibility.

Experimental and computational methodology in structural biology

This cross-journal Collection highlights methodological developments in instrument design, sample preparation, data acquisition, data analysis, interpretation and integration from different techniques.

Publishing Model: Open Access

Deadline: Apr 30, 2026

Explore this Collection

Advances in Asymmetric Catalysis for Organic Chemistry

This Collection collates the latest trends and fundamental developments in asymmetric catalysis, including but not limited to the design of chiral catalysts, enantioselective transformations, different types of chirality (e.g., stereocenter, axial, planar, inherent, helical, torsional, topological), mechanistic studies, and catalytic strategies for the synthesis of biologically active compounds and natural products.

Publishing Model: Open Access

Deadline: Mar 31, 2026

Explore this Collection

Predicting Drug Resistance Before It Happens. Integrating Genome Editing, Structural Biology, and Computational Modelling

Behind the Paper

Behind the paper: Protic ionic liquids as binders for carbon paste electrode fabrication

Behind the Paper

Serine family proteases in RiPP biosynthesis: S9 protease WprP

Behind the Paper

Unveiling the Mystery of Vanadium: How High-Throughput Crystallography Cracked a Metallodrug Code

News and Opinion

Quarterly Highlights from the Mathematics, Physical and Applied Sciences Communities  

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Fast and effective molecular property prediction with transferability map

Share this post

Share with...

...or copy the link