Table of Contents
Fetching ...

Autonomous thermodynamically informed database generation for machine-learned interatomic potentials and application to magnesium

Vincent G. Fletcher, Albert P. Bartók, Livia B. Pártay

TL;DR

The paper presents a fully automated, thermodynamically informed approach to build training databases for interatomic potentials, using Nested Sampling to sample configurations by thermodynamic relevance and validating with ab initio DFT. An Atomic Cluster Expansion potential is trained in iterative cycles on magnesium data spanning 0–600 GPa and 0–8000 K, achieving accurate phonons, elastic constants, defect energetics, and phase boundaries. The work demonstrates robust transferability and generality, enabling reliable predictions across wide regions of phase space at reduced computational cost. This framework addresses dataset bias and sampling inefficiency in MLIP construction, with broad implications for predictive materials modelling under extreme conditions.

Abstract

We propose a novel approach for constructing training databases for Machine-Learned Interatomic Potential (MLIP) models, specifically designed to capture phase properties across a wide range of conditions. The framework is uniquely appealing due to its ease of automation, its suitability for iterative learning, and its independence from prior knowledge of stable phases, avoiding bias towards pre-existing structural data. The approach uses Nested Sampling (NS) to explore the configuration space and generate thermodynamically relevant configurations, forming the database which undergoes ab initio Density Functional Theory (DFT) evaluation. We use the Atomic Cluster Expansion (ACE) architecture to fit a model on the resulting database. To demonstrate the efficiency of the framework, we apply it to magnesium, developing a model capable of accurately describing behaviour across pressure and temperature ranges of 0-600 GPa and 0-8000 K, respectively. We benchmark the model's performance by calculating phonon spectra and elastic constants, as well as the pressure-temperature phase diagram within this region. The results showcase the power of the framework to produce robust MLIPs while maintaining transferability and generality, for reduced computational cost.

Autonomous thermodynamically informed database generation for machine-learned interatomic potentials and application to magnesium

TL;DR

The paper presents a fully automated, thermodynamically informed approach to build training databases for interatomic potentials, using Nested Sampling to sample configurations by thermodynamic relevance and validating with ab initio DFT. An Atomic Cluster Expansion potential is trained in iterative cycles on magnesium data spanning 0–600 GPa and 0–8000 K, achieving accurate phonons, elastic constants, defect energetics, and phase boundaries. The work demonstrates robust transferability and generality, enabling reliable predictions across wide regions of phase space at reduced computational cost. This framework addresses dataset bias and sampling inefficiency in MLIP construction, with broad implications for predictive materials modelling under extreme conditions.

Abstract

We propose a novel approach for constructing training databases for Machine-Learned Interatomic Potential (MLIP) models, specifically designed to capture phase properties across a wide range of conditions. The framework is uniquely appealing due to its ease of automation, its suitability for iterative learning, and its independence from prior knowledge of stable phases, avoiding bias towards pre-existing structural data. The approach uses Nested Sampling (NS) to explore the configuration space and generate thermodynamically relevant configurations, forming the database which undergoes ab initio Density Functional Theory (DFT) evaluation. We use the Atomic Cluster Expansion (ACE) architecture to fit a model on the resulting database. To demonstrate the efficiency of the framework, we apply it to magnesium, developing a model capable of accurately describing behaviour across pressure and temperature ranges of 0-600 GPa and 0-8000 K, respectively. We benchmark the model's performance by calculating phonon spectra and elastic constants, as well as the pressure-temperature phase diagram within this region. The results showcase the power of the framework to produce robust MLIPs while maintaining transferability and generality, for reduced computational cost.

Paper Structure

This paper contains 31 sections, 7 equations, 31 figures, 13 tables.

Figures (31)

  • Figure 1: Schematic workflow of the iterative potential fitting process. The cycle starts with generating configurations from NS. In the zeroth cycle this is done using an arbitrary initial potential, an EAM in the current work. A database is autonomously constructed, guided by the thermodynamic information and samples generated by NS. The database undergoes DFT evaluation and an MLIP is (re)fitted, acting as a new input to the next cycle for further refinement if needed. We repeated this cycle 5 times.
  • Figure 2: The enthalpy of configurations generated during NS in cycle 0 at 1 GPa.a shows the enthalpy of the samples plotted as a function of NS iterations, with the inset showing the average $Q_6$ and $W_6$ Steinhardt bond order parameters of the configurations, coloured by their associated temperature. b shows the enthalpy of samples plotted as a function of temperature, with snapshots of the highest and lowest enthalpy configurations selected for training shown. The 100 configurations, equally spaced in iteration number, that were added to the database are marked by red crosses. In cycle 0 all samples were generated using the EAM model.
  • Figure 3: Schematic representation of sampling with a model committee STD restriction. By sampling with the model committee STD restriction, the walkers avoid becoming trapped in holes of the PES during NS. Solid black and dashed blue lines represent the target ab initio PES and the MLIP PES respectively. Blue circles indicate configurations generated during the sampling, with corresponding black circles showing the same configurations after evaluation by DFT which are thereafter added to the database for MLIP training. Upper panel demonstrates the corresponding uncertainty of the model, with the orange dashed line indicating the limit, above which samples are rejected, shown by the red circle on the PES.
  • Figure 4: Distribution of committee STD of samples and minimum interatomic distance within those samples, across cycles, across pressure, with and without the maximum STD restriction enabled. We show how the committee STD (top three rows) and minimum bond length (bottom row) of samples changes across pressures (red: 600 GPa, orange: 160 GPa, green: 20 GPa), across cycles (cycle 2: column a and b, cycle 3: column c and d) both with (column b and d) and without (column a and c) the committee STD restriction, shown by the black dashed line. We highlight the unphysically short bond lengths and substantial STD values in column a and show how the STD restriction, applied in column b, corrects this behaviour. We also highlight the minimal effect of the STD restriction when physical samples are produced by comparing columns c and d.
  • Figure 5: The distribution of $\boldsymbol{Q_6}$ and $\boldsymbol{W_6}$ Steinhardt bond order parameters, for the samples at 600 GPa, across cycles, with and without the maximum STD restriction enabled.a shows the distribution from cycle 2 with the restriction enabled, b and c show the distributions from cycle 3 without and with the restriction enabled respectively. Samples are coloured by temperature with red crosses indicating the samples added to the database. We highlight the drastic change in distribution from cycle 2 to 3, and the minimal effect of the restriction through the similarity between b and c.
  • ...and 26 more figures