Behind the Paper

Modeling solvent effects in chemical reactions

Solvent effects play a key role in chemical reactions. Traditional methods are either too simplistic or computationally expensive. We introduce an efficient training approach to generate reactive machine learning potentials with far less data and effort than state-of-the-art approaches

Published in Chemistry and Physics

Jul 23, 2024

Fernanda Duarte, Hanwen Zhang & Veronika Juraskova

3 contributors

Modeling solvent effects in chemical reactions

Liked by India Ambler and 1 other

Explore the Research

Most chemical reactions and nearly all biological processes occur in a liquid phase, with water being the most common solvent. Presence of the solvent molecules is crucial as they influence the stability of chemical species, the rate and mechanism of reactions, and the distribution of products. In organic chemistry, choosing the "right" solvent is key to the success of the synthesis. However, this choice is often based on empirical observations rather than a detailed understanding of how solvents influence reactions at the molecular level. While spectroscopic and computational methods are increasingly used to explore such effects, they often fall short of capturing the full complexity of these systems.

For us, computational chemists, balancing the accuracy and efficiency of our models is an everyday battle. The choice of the most suitable tools becomes particularly challenging for solvated systems, where the direct incorporation of solvent molecules leads to a significant increase in the system size. Modelling solvent effects can thus range from relatively cheap and simple continuum models, which approximate the solvent as a polarizable field, to computationally costly ab initio molecular dynamic (AIMD) approaches, where dynamical trajectories are generated using forces computed “on the fly” by solving the Schrödinger equation.

Frustrated by this ongoing struggle, our group turned to Machine Learning Potentials (MLPs) as an alternative to traditional classical and quantum methods for describing solvent effects. MLPs enable efficient mapping between nuclear configurations and energies/forces without the need to solve the Schrödinger equation directly for each structure. Moreover, unlike classical force fields, MLPs offer higher flexibility and the possibility for systematic improvement.

Building on previous work in our group, led by our colleagues Tom Young and Tristan Johnston-Wood, we implemented an Active Learning (AL) workflow to train reactive MLPs capable of describing organic reactions without relying on AIMD data. [1,2] As we began tackling more complex systems, we found that the bottleneck in the whole process was the selection of new and representative configurations to add to the training dataset. To address this, we focused on refining the structure selection process with two key improvements: defining a new selector and training on sub-systems that encompass intrinsic reactivity, and solute-solvent and solvent-solvent interactions. We used the Diels–Alder reaction of cyclopentadiene (CP) and methyl vinyl ketone (MVK) in explicit water and methanol as a representative system.

Traditionally, the selection step in AL strategies relies on variance in the prediction of energy and/or forces. The configurations with high variances are identified as under-represented and added to the training set. In our work, we used a slightly different strategy. Instead of looking into the variance, we investigate how the training data covers the potential energy surface (PES) represented in a feature space. To do so, we adopted the Smooth Overlap Atomic Positions (SOAP) descriptor to represent training data. [3] During the selection process, we either compare the SOAP similarity of the new configuration to existing data or determine if the new data point is an outlier to the training data set. We call this approach descriptor-based selectors.

We also introduced a computational strategy to build training data more efficiently by using knowledge of the specific chemistry being studied. Specifically, we combined data sets that represent the reaction under study with sets describing solvent-solvent and solvent-solute interactions. This approach, combining descriptor-based selectors and sub-system data sets, produces accurate and data-efficient MLPs using only 600 configurations, which contrasts with the several thousand required when using AIMD. The trained MLPs have already provided key insights into the origin of solvation effects on the Diels-Alder reaction, and will hopefully motivate the exploration of solvent effects more broadly. To facilitate this, we have automated the process and made it easy to use through our mlp-train package, which we continue to develop. We would welcome your feedback and suggestions.

[1] T. A. Young, T. Johnston-Wood, V. L. Deringer and F. Duarte, Chem. Sci., 2021, 12, 10944– 10955.
[2] T. A. Young, T. Johnston-Wood, H. Zhang and F. Duarte, Phys. Chem. Chem. Phys., 2022, 24, 20820–20827.
[3] A. P. Bart ́ok, R. Kondor and G. Cs ́anyi, Phys. Rev. B, 2013, 87, 184115.

Multiple Contributors

Fernanda Duarte, Hanwen Zhang & Veronika Juraskova

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Theoretical Chemistry

Physical Sciences > Chemistry > Theoretical Chemistry

Data-driven Science, Modeling and Theory Building

Physical Sciences > Physics and Astronomy > Theoretical, Mathematical and Computational Physics > Complex Systems > Data-driven Science, Modeling and Theory Building

Nature Communications

Nature Communications

An open access, multidisciplinary journal dedicated to publishing high-quality research in all areas of the biological, health, physical, chemical and Earth sciences.

More about the journal

Related Collections

With Collections, you can get published faster and increase your visibility.

Women's Health

A selection of recent articles that highlight issues relevant to the treatment of neurological and psychiatric disorders in women.

Publishing Model: Hybrid

Deadline: Ongoing

Explore this Collection

Advances in neurodegenerative diseases

This Collection aims to bring together research from various domains related to neurodegenerative conditions, encompassing novel insights into disease pathophysiology, diagnostics, therapeutic developments, and care strategies. We welcome the submission of all papers relevant to advances in neurodegenerative disease.

Publishing Model: Hybrid

Deadline: Mar 24, 2026

Explore this Collection

Latest Content

Trigeminal nerve stimulation for ADHD – from hype to broken hope

Unveiling the Mystery of Vanadium: How High-Throughput Crystallography Cracked a Metallodrug Code

Behind the Paper

Single cell proteomics of human neutrophils in glioblastoma

Behind the Paper

Digital literacy and post truth challenges in the rural academic ecosystem of Tamil Nadu

Opportunities

Special Issue ''Advanced Technologies and Innovative Process Control Strategies Used for Sustainable Wastewater Treatment and Reuse''

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Modeling solvent effects in chemical reactions

Share this post

Share with...

...or copy the link