Behind the Paper

Fast and effective molecular property prediction with transferability map

Transfer learning improves molecular property prediction in limited datasets, yet suffers from negative transfer due to insufficient relatedness. We develop a principal gradient-based measurement to evaluate transferability before applying transfer learning, significantly improving the performance.

Published in Chemistry

Apr 19, 2024

Shaolun Yao

Ph.D. Student, Zhejiang University

Liked by India Ambler

Explore the Research

Challenge

Molecular property prediction has been widely considered as one of the most critical tasks in computational drug and materials discovery, as many methods rely on predicted molecular properties to evaluate, select and generate molecules. With the development of artificial intelligence (AI), effective transfer learning for molecular property prediction exhibits a great advantage in addressing insufficient labeled molecules. However, many existing methods are still challenged by their inability to effectively account for the quantitative relationship between source and target properties, risking negative transfer, or necessitate intensive training on target tasks. Thus, a fast and effective method for quantifying the suitability of the source property for the target property prior to training on the target task is required.

Approach

To address the challenge of transfer learning for molecular property prediction, we propose a simple, fast, and effective Principal Gradient-based Measurement (PGM) to quantify the transferability from the source property to the target property (Figure 1). First, inspired by the predictive role of gradients in capturing intrinsic task-related characteristics for model optimization, we design a restart scheme to calculate a principal gradient in an optimization-free manner. The distance between the principal gradient obtained from model training on the source dataset and that derived from the target dataset indicates transferability. Second, we build a quantitative transferability map by performing PGM on various molecular property prediction datasets to show the inter-property correlations in property space distribution. The map is extensible and can be a reference standard for transfer learning in molecular property prediction, even when applied to a few target samples. Third, through the map, we can capture and transfer the most desirable source dataset for the given target dataset, so as to promote performance on the target task and avoid negative transfer.

Figure 1. Illustration of Principal Gradient-based Measurement (PGM) for guiding transfer learning in molecular property prediction.

Results

We evaluate PGM thoroughly on 12 benchmark datasets from MoleculeNet with various molecular property prediction tasks. We build a quantitative transferability map to intuitively observe the task-relatedness between these molecular property prediction datasets. Then we perform a transferability map-guided cross-task transfer learning strategy. Specifically, each of the 12 datasets is used as the target dataset, while the remaining 11 datasets are employed as source datasets, as described below. Initially, the model is trained on each source dataset to obtain pre-trained models. Subsequently, each of these pre-trained models is fine-tuned on the target dataset. As depicted in Figure 2, a significant correlation between the predicted transferability and the transfer learning performance across various tasks can be observed.

Figure 2. Comparison of the PGM distance and the transfer performance on the 12 target datasets.

Furthermore, we expand the transferability map's application from the above properties to subtasks within them. We also investigate the effectiveness of each module in PGM by conducting ablation studies focusing on three key areas: the computation efficiency of PGM, its performance relative to the size of the target dataset, and its behavior in relation to varying dataset sizes across different tasks. As resulted, the proposed approach can serve as fast and effective guidance to enhance the transfer performance of molecular property prediction.

Highlights

We propose a method to support transferability quantification for molecular property prediction datasets. Specifically, we design a principal gradient to approximate model optimization, which performs on source and target datasets to realize transferability measure between datasets. Furthermore, we build a transferability map based on PGM to access task-relatedness prior to applying transfer learning. Both theoretical and empirical studies demonstrate that PGM strongly correlates with the transfer performance of molecular property prediction, making it a quantified transferability measure for source dataset selection. This work contributes to more efficient discovery of drugs, materials, and catalysts by offering a task-relatedness quantification prior to transfer learning and understanding the relationship between chemical properties.

For more detail on the experiments and results, please read our paper:

https://www.nature.com/articles/s42004-024-01169-4

Shaolun Yao (She/Her)

Ph.D. Student, Zhejiang University

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Drug Development

Physical Sciences > Chemistry > Biological Chemistry > Medicinal Chemistry > Drug Development

Communications Chemistry

Communications Chemistry

An open access journal from Nature Portfolio publishing high-quality research, reviews and commentary in all areas of the chemical sciences.

More about the journal

Related Collections

With Collections, you can get published faster and increase your visibility.

Advances in Polymer Synthesis

All participating journals invite submissions of original research articles, with Nature Communications and Communications Chemistry also considering Reviews and Perspectives which fall within the scope of the Collection. All submissions will be subject to the same peer review process and editorial processes as regular Nature Communications, Communications Chemistry, and Scientific Reports articles.

Publishing Model: Open Access

Deadline: Jan 31, 2026

Explore this Collection

f-block chemistry

This Collection aims to highlight recent progress in f-element chemistry, encompassing studies on fundamental electronic structure, advances in separation chemistry, advances in coordination and organometallic chemistry, and the application of f-element compounds in materials science and environmental technologies.

Publishing Model: Open Access

Deadline: Feb 28, 2026

Explore this Collection

Vagus nerve stimulation improves insomnia in breast cancer.

Behind the Paper

Integrating Clinical Guidelines and AI to Improve Respiratory Failure Management

Behind the Paper

Beyond the Kahoot! Music: Are We Enhancing Learning or Just Entertaining Students?

Behind the Paper

Stress-induced brain extracellular vesicles as molecular messengers of resilience

Behind the Paper

Testing a nature-based family activity intervention for children with attention deficit hyperactivity disorder

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Fast and effective molecular property prediction with transferability map

Share this post

Share with...

...or copy the link