Setting the standard for machine learning in phase field prediction: a benchmark dataset and baseline metrics

Phase field modeling has long bridged the gap between atomic-scale behaviors and macroscopic phenomena, but its computational cost remains a barrier. Discover how machine learning accelerates these simulations without sacrificing accuracy, transforming the future of computational materials science.
Setting the standard for machine learning in phase field prediction: a benchmark dataset and baseline metrics
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

Phase field models are indispensable for bridging atomic-scale behaviors and macroscopic phenomena. They excel at capturing complex microstructural evolution and phase transformation processes, such as solidification, grain growth, and phase separation. However, their reliance on solving computationally expensive differential equations limits their practicality, especially for long time-scale simulations or high-dimensional parametric studies. This challenge highlights the potential of machine learning (ML) to accelerate simulations while maintaining physical fidelity.

By training a machine learning model to predict several-frame hops into the simulation we can accelerate phase field simulations. 

ML offers a powerful alternative, enabling rapid predictions of phase field trajectories by "jumping forward" without iteratively solving complex equations (Figure 1). This approach dramatically improves the efficiency of exploratory and optimization tasks. Yet, despite its promise, progress in ML-accelerated phase field modeling has been hindered by a lack of standardization. The field has seen a surge of new algorithms, often presented without rigorous comparisons to previous methods, along with a persistent absence of open-access datasets and reproducible code. These gaps make it difficult to evaluate and benchmark advancements comprehensively.

Adopting a new ML model for phase field prediction often involves significant overhead. Researchers must source or generate datasets, preprocess them to be compatible with common machine learning frameworks and processes, implement the algorithm, and create evaluation methodologies. Only the algorithm itself typically represents novel work, while the rest is repetitive effort that slows innovation. Simplifying and standardizing this process is essential to unlock the field's full potential.

Our motivation with this work is to lower the entry barrier and establish a framework that makes algorithms easily comparable, thereby accelerating progress in the field.

To do this, we introduce a benchmark dataset of phase field simulations, along with open-source code, baseline models, and standardized metrics for evaluation. This resource reduces barriers to entry, enabling researchers to focus on developing novel approaches rather than recreating foundational tools. One critical feature of our dataset is its ability to capture how domain size influences prediction difficulty, as seen in Figure 2. Larger or more complex domains exacerbate errors, underscoring the importance of representative and consistent datasets for benchmarking ML models.

Our dataset centers on phase field simulations of lithium iron phosphate (LFP) battery electrodes, a material system governed by the Cahn-Hilliard equation. This equation describes phenomena like spinodal decomposition, a key mechanism in the lithiation and delithiation dynamics of LFP nanoparticles. To model these dynamics, we tailored the chemical potential formulation to reflect LFP's unique phase transformations. The dataset includes over 1,100 simulation trajectories, capturing microstructural evolution under diverse initial conditions, particle sizes, and other parameters.

These simulations were generated using a computationally efficient implementation of the Cahn-Hilliard equation. Parameters such as mobility, gradient penalty coefficients, and regular solution free energy were carefully calibrated using experimental data and literature values. By leveraging high-performance computing (HPC) systems, we systematically captured the interplay of concentration fields within LFP nanoparticles, producing a robust dataset for training and evaluating ML models.

To validate the dataset's utility, we benchmarked two widely recognized ML architectures: U-Net and SegFormer. U-Net's encoder-decoder design with skip connections proved well-suited for capturing fine-grained spatial details, while SegFormer leveraged transformer-based mechanisms to capture both local and global features. U-Net consistently outperformed SegFormer in terms of prediction accuracy across various scenarios, demonstrating its effectiveness for this class of problems.

A distinct feature of our approach is the incorporation of self-consistency mechanisms into the U-Net architecture. These mechanisms ensure that predictions remain physically plausible across time steps, greatly improving the accuracy of long-term trajectory predictions. For example, the U-Net achieved an absolute relative error (ARE) of 2.6% when predicting microstructural evolution, outperforming previous methods tested on different datasets. This integration of physical constraints highlights the potential for enhancing ML models' reliability and performance.

By enabling rapid and accurate phase field predictions, our dataset and model open avenues for systematic parametric studies and multi-scale simulations that were previously computationally prohibitive. For battery materials research, such simulations are particularly impactful, offering insights into phase boundary movement, intra-particle stress evolution, and their implications for electrode performance and durability.

Our contributions include the benchmark dataset and a modular codebase that simplifies integration and extension for new research. By establishing a solid baseline and providing tools for data processing, ML model training, and evaluation, we aim to lower the barriers to entry for researchers integrating ML into phase field modeling. This resource facilitates fair comparisons across different algorithms, fostering accelerated development in the field.

In conclusion, our work addresses a critical need in phase field modeling by providing a standardized, open-access dataset for ML-based predictions. By merging the rigor of physics-based simulations with the efficiency of ML, we demonstrate that accelerated simulations are achievable without compromising accuracy. The insights from this work advance the state-of-the-art in phase field modeling and set the stage for broader applications of ML in computational materials science. We invite the community to build upon this foundation, exploring novel architectures, datasets, and methodologies to unlock the full potential of ML-accelerated simulations.

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Machine Learning
Mathematics and Computing > Computer Science > Artificial Intelligence > Machine Learning
Materials Characterization Technique
Physical Sciences > Materials Science > Materials Characterization Technique

Related Collections

With collections, you can get published faster and increase your visibility.

Epidemiological data

This Collection presents a series of articles describing epidemiological datasets spanning diverse populations, ecosystems, and disease contexts. Data are presented without hypotheses or significant analyses, and can be derived from population surveys, health registries, electronic health records, field sampling, or other sources.

Publishing Model: Open Access

Deadline: Mar 27, 2025

Neuroscience data to understand human behaviour

This Collection presents descriptions of datasets combining brain imaging or neurophysiological data performed alongside real-world tasks or exposure to different stimuli.

Publishing Model: Open Access

Deadline: Jan 30, 2025