Behind the Paper

Modelling local and general quantum mechanical properties with attention-based pooling

Atomistic neural networks represent the state-of-the-art for approximating quantum chemical properties of molecules. However, the atom aggregation or pooling function is relatively understudied. Here we develop an attention-based solution for learning molecular representations from sets of atoms.

Published in Chemistry, Materials, and Computational Sciences

Feb 07, 2024

David Buterez

PhD Student, University of Cambridge

Modelling local and general quantum mechanical properties with attention-based pooling

Liked by India Ambler

Explore the Research

Quantum chemistry methods are important building blocks within fields as varied as materials science, drug discovery, catalysis, and molecular biology. Common applications include the computation of quantum mechanics properties such as HOMO (highest occupied molecular orbital) and LUMO (lowest unoccupied molecular orbital) energies and the HOMO-LUMO gap, molecular dipoles, rotational constants, atomic charges, molecular potential energies and forces, and more. Recent works have also focused on providing and predicting Hamiltonian matrices for molecules. Historically, such computations have been handled with numerical solutions such as the Hartree-Fock, Post-Hartree–Fock, and Density Functional Theory (DFT) methods. However, the most accurate algorithms, such as the “gold standard” coupled-cluster (Post-Hartree–Fock) method, scale extremely poorly in terms of computational resources, with algorithmic complexities given by high-degree polynomials (such as O(n⁷)) in terms of the number of basis functions, which in turn usually depends on the number of electrons. To combat these limitations, semi-empirical methods have been proposed as an alternative to techniques based on first principles which incorporate experimental (empirical) data to estimate certain parameters that would otherwise be computationally expensive. Such methods can be orders of magnitude faster than DFT.

At the same time, data-driven methods that learn from large datasets of quantum mechanical properties have been at the forefront of machine learning research for a long time. Although such methods are considered “black boxes” and rely on large datasets and the ability of models to generalise well, the dramatically better computational performance (e.g. predictions for millions of molecules in seconds) and high accuracy have led to increased interest in developing novel, accurate, and efficient methods. Prediction of quantum properties can be performed using general purpose neural networks such as multi-layer perceptrons or graph neural networks. However, state-of-the-art algorithms exploit properties specific to molecules such as 3D atomic coordinates or bond angles.

The design of 3D-aware models has continued to evolve, most often by increasing the expressivity of the feature extraction components of the neural network models. For example, one of the earliest and most well-known architectures, SchNet, introduced continuous-filter convolutions to learn directly from 3D coordinates and atomic types. More recently, DimeNet proposed a message passing scheme that uses directional information given by the angle between atom pairs. Solutions that replace invariant operations (such as in DimeNet) with more expressive equivariant alternatives have also been proposed, for example in PaiNN.

Neural networks designed for molecules naturally operate at the level of atoms, by starting with initial features such as the atom type and coordinates and then iteratively learning more informative representations through convolutions or message passing steps. However, most quantum properties of interest are formulated at the molecular level. As such, a technique for aggregating the learnt atom-level features into a single, molecular-level representation is required. In geometric deep learning and graph neural networks in particular, this responsibility is handled by the readout or pooling function. From a geometric deep learning perspective, the pooling function must satisfy permutation invariance with respect to the atom ordering, meaning that the function must output the same result regardless of the order in which the atom representations are presented. Due to this limiting constraint, most architectures rely on simple pooling functions that satisfy this demand, the standard choices being the sum, mean, or maximum functions.

A common step in most atom-centred neural networks is the aggregation or pooling of learnt atom features into a molecule-level representation through a dedicated function (highlighted with a dashed box). Traditionally, simple functions that satisfy permutation invariance such as sum, mean, or maximum are used for this step. Alternatively, a more expressive molecular representation can be computed by neural networks, for example using attention to discover the most relevant atomic features.

Although the architectures designed for quantum property prediction have continued to advance, most of the efforts have been focused on the atom-level feature extraction components, with the pooling function lagging behind. In our work, we propose an alternative to standard pooling protocols and design a pooling function based on neural networks, more specifically leveraging the attention mechanism that also underlies the famous Transformer architecture. This alternative function is applicable to any quantum property, and to molecular property prediction in general. Furthermore, the attention alternative is a drop-in replacement requiring no changes to any of the other architectural components, regardless of the underlying network.

Here, we used SchNet and DimeNet (more specifically, the updated DimeNet++ variant) as the main architectures and benchmarked the predictive performance of attention-based pooling against the standard choices of sum and mean pooling. Our main focus was on frontier orbital energies such as the HOMO and LUMO since they can be highly localised, or in other words the electron density can be concentrated on a small set of individual atoms or bonds. Such properties are, in theory, not well approximated by neural networks that rely on pooling functions that attribute equal importance to all atom representations, such as sum or mean. In contrast, other types of properties such as the total energy are a function of all atoms and standard functions are a sensible choice. Technically, the proposed attention-based pooling function computes so-called attention scores for each possible atom pair within a molecule, allowing arbitrary contributions from any atom. Moreover, multiple attention layers can be stacked, enabling the modelling of higher-order interactions. Thus, the final molecular embedding can represent both delocalised and localised properties.

Our extensive evaluation includes dataset such as QM7b, QM8, QM9, QMugs, MD17, and OE62, each having its own set of properties and constraints such as dataset size and molecular diversity. Across all kinds of tasks, we were able to consistently improve upon standard pooling mechanisms thanks to our attention-based alternative, for both SchNet and DimeNet++. Remarkably, attention performed better not only for frontier orbital energies, but a wide variety of quantum properties, including extensive and non-local properties like the total energy available in QMugs. Attention-based pooling was also preferable to standard pooling options for tasks operating in a low-data regime, in this case MD17 (between 1,500 and 2,000 molecules), contradicting the view that attention is only helpful on large datasets. For HOMO and LUMO, we have also considered the OE62 dataset and the recently proposed Orbital Weighted Average (OWA) pooling method for more accurate orbital energy prediction. Despite OWA requiring the pre-computation of orbital coefficients and using a second neural network to predict the weight of each atom for the pooling function, we find that attention-based pooling outperforms OWA in all evaluated instances.

SchNet and DimeNet++ models evaluated on HOMO energy prediction on different datasets (QM7b with different levels of theory, QM9, and QMugs), with the mean absolute error reported on test sets corresponding to five different random splits for each dataset.

Due to the overall success of attention-based pooling on a wide selection of quantum tasks, we believe that it is a general technique that can be adopted by researchers for upcoming architectures. Although classical attention scales quadratically with the number of atoms (both time and space), the pooling function presented here relies largely on self-attention, which benefits from recent innovations that deliver fast and exact attention. More specifically, implementations where the memory utilisation scales with the square root of the number of atoms (so-called “memory efficient attention”), or linearly in time and space (Flash Attention) are available and we have demonstrated their advantages in the paper.

David Buterez

PhD Student, University of Cambridge

I am currently pursuing a PhD degree in Computer Science, with a focus on Machine Learning techniques for biology and chemistry. More generally, I am interested in applying Computer Science techniques in fields relevant to human biology and health, as well as rapidly developing areas like biotechnology, synthetic biology, and alternative computing paradigms.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Quantum Chemistry

Physical Sciences > Chemistry > Theoretical Chemistry > Quantum Chemistry

Machine Learning

Mathematics and Computing > Computer Science > Artificial Intelligence > Machine Learning

Artificial Intelligence

Mathematics and Computing > Computer Science > Artificial Intelligence

Computational Materials Science

Physical Sciences > Materials Science > Computational Materials Science

Communications Chemistry

Communications Chemistry

An open access journal from Nature Portfolio publishing high-quality research, reviews and commentary in all areas of the chemical sciences.

More about the journal

Related Collections

With Collections, you can get published faster and increase your visibility.

Advances in Polymer Synthesis

All participating journals invite submissions of original research articles, with Nature Communications and Communications Chemistry also considering Reviews and Perspectives which fall within the scope of the Collection. All submissions will be subject to the same peer review process and editorial processes as regular Nature Communications, Communications Chemistry, and Scientific Reports articles.

Publishing Model: Open Access

Deadline: Jan 31, 2026

Explore this Collection

f-block chemistry

This Collection aims to highlight recent progress in f-element chemistry, encompassing studies on fundamental electronic structure, advances in separation chemistry, advances in coordination and organometallic chemistry, and the application of f-element compounds in materials science and environmental technologies.

Publishing Model: Open Access

Deadline: Feb 28, 2026

Explore this Collection

Accelerating and improving drug discovery by transfer learning in the multi-fidelity setting

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Modelling local and general quantum mechanical properties with attention-based pooling

Share this post

Share with...

...or copy the link