Prediction of organic homolytic bond dissociation enthalpies at near chemical accuracy with sub-second computational cost

A graph convolutional neural network can be trained to reproduce quantum mechanical energetics within a kcal/mol for a fraction of the cost.

Published in Chemistry

Prediction of organic homolytic bond dissociation enthalpies at near chemical accuracy with sub-second computational cost
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

For this paper, a collaboration between Peter St. John, Seonah Kim and Yeonjoon Kim at NREL and Yanfei Guan and Robert Paton at Colorado State University,  we trained a graph convolutional neural network against nearly 300,000 organic bond dissociation enthalpies. The resulting model, a machine-Learning derived, Fast, Accurate Bond dissociation Enthalpy Tool (ALFABET) predicts BDE values that are comparable to density functional theory in much less than a second. The predictions can be run on the web: https://bde.ml.nrel.gov/

Machine learning approaches such as ours are “data hungry” so we needed a large number of BDE values to train the model. Far more than are available experimentally, in fact. We compared a variety of quantum mechanical methods against the very useful iBonD database and found that M06-2X/def2-TZVP calculations agreed most closely. Using this level of theory we built an automated workflow to fragment around 40,000 organic molecules in every way conceivable* and to launch and analyze hundreds of thousands of DFT calculations required to obtain BDE values. Fragmentation and conformational analysis were done using Python and rdkit. In the paper we describe the effects of using multiple DFT conformers vs just one on the results. Running this many calculations is challenging, particularly when they involve open-shell species (e.g., radicals)  , since there are various errors that must be checked and filtered in an automated fashion. In addition, we used a statistical model to detect for the presence of outliers in the computational data that were then removed. 

To exemplify how ALFABET can be used, we predict the weakest C-H bond(s) in a series of pharmaceutical molecules, and show that these sites are highly represented among the positions of metabolism (e.g., through oxidation P450 enzymes). The analysis takes seconds and gives comparable results to much lengthier DFT calculations. We also show that for a series of molecules used in fuel, the identities of radicals formed by cleavage of the weakest bond can be used to develop a multivariate linear regression model that predicts the yield sooting index, a measurement of soot formation. Future extension of the ALFABET tool with additional DFT calculations will enable predictions for expanded atom types and zwitterionic functional groups.

Although the training data was generated using 3D structures, the graph neural network employed in this study only requires the 2D-connectivity of each molecule, which performs adequately for this task. Other properties, such as NMR chemical shift, require the explicit consideration of different molecular conformers, and therefore require 3D atomic features.. We have begun to explore this area.

* With the exception of double bonds, and rings, and fragmentations that create new stereogenic centers (e.g. through the destruction of symmetry)

Read the paper: Prediction of organic homolytic bond dissociation enthalpies at near chemical accuracy with sub-second computational cost Peter C. St. John, Yanfei Guan, Yeonjoon Kim, Seonah Kim & Robert S. Paton, Nat. Commun. 2020, 11, 2328 

https://www.nature.com/articles/s41467-020-16201-z

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Chemistry
Physical Sciences > Chemistry

Related Collections

With collections, you can get published faster and increase your visibility.

Applications of Artificial Intelligence in Cancer

In this cross-journal collection between Nature Communications, npj Digital Medicine, npj Precision Oncology, Communications Medicine, Communications Biology, and Scientific Reports, we invite submissions with a focus on artificial intelligence in cancer.

Publishing Model: Open Access

Deadline: Sep 30, 2025

Smart Materials for Bioengineering and Biomedicine

In this cross-journal Collection at Nature Communications, Communications Biology, Communications Engineering, Communications Materials, Communications Medicine and Scientific Reports, we welcome submissions focusing on various aspects, from mechanistic understanding to clinical translation, of smart materials for applications in bioengineering and biomedicine, such as, drug delivery, biosensing, bioimaging and tissue engineering.

Publishing Model: Open Access

Deadline: Sep 30, 2025