Table of Contents
Fetching ...

Accurate machine learning force fields via experimental and simulation data fusion

Sebastien Röcken, Julija Zavadlav

TL;DR

It is demonstrated that the fused data learning strategy can concurrently satisfy all target objectives, thus resulting in a molecular model of higher accuracy compared to the models trained with a single data source.

Abstract

Machine Learning (ML)-based force fields are attracting ever-increasing interest due to their capacity to span spatiotemporal scales of classical interatomic potentials at quantum-level accuracy. They can be trained based on high-fidelity simulations or experiments, the former being the common case. However, both approaches are impaired by scarce and erroneous data resulting in models that either do not agree with well-known experimental observations or are under-constrained and only reproduce some properties. Here we leverage both Density Functional Theory (DFT) calculations and experimentally measured mechanical properties and lattice parameters to train an ML potential of titanium. We demonstrate that the fused data learning strategy can concurrently satisfy all target objectives, thus resulting in a molecular model of higher accuracy compared to the models trained with a single data source. The inaccuracies of DFT functionals at target experimental properties were corrected, while the investigated off-target properties remained largely unperturbed. Our approach is applicable to any material and can serve as a general strategy to obtain highly accurate ML potentials.

Accurate machine learning force fields via experimental and simulation data fusion

TL;DR

It is demonstrated that the fused data learning strategy can concurrently satisfy all target objectives, thus resulting in a molecular model of higher accuracy compared to the models trained with a single data source.

Abstract

Machine Learning (ML)-based force fields are attracting ever-increasing interest due to their capacity to span spatiotemporal scales of classical interatomic potentials at quantum-level accuracy. They can be trained based on high-fidelity simulations or experiments, the former being the common case. However, both approaches are impaired by scarce and erroneous data resulting in models that either do not agree with well-known experimental observations or are under-constrained and only reproduce some properties. Here we leverage both Density Functional Theory (DFT) calculations and experimentally measured mechanical properties and lattice parameters to train an ML potential of titanium. We demonstrate that the fused data learning strategy can concurrently satisfy all target objectives, thus resulting in a molecular model of higher accuracy compared to the models trained with a single data source. The inaccuracies of DFT functionals at target experimental properties were corrected, while the investigated off-target properties remained largely unperturbed. Our approach is applicable to any material and can serve as a general strategy to obtain highly accurate ML potentials.
Paper Structure (15 sections, 3 equations, 6 figures, 1 table)

This paper contains 15 sections, 3 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Investigated models. The DFT pre-trained model (a) is trained only with DFT trainer (d), which optimizes the parameters of the ML potential to match the reference DFT potential energy $\tilde{U}$, forces $\tilde{F}$, and virial $\tilde{V}$ for different atomic environments $S$. For the DFT, EXP sequential model (b), the ML potential is initialized with the parameters of the DFT pre-trained model and trained with EXP trainer (e), where the ML potential is trained to reproduce experimental observables $\tilde{O}$. EXP trainer requires simulations since the observables are not a direct output of the ML model but computed as a time average over the simulated trajectory. The DFT & EXP fused model (c) is obtained by alternating between the DFT and EXP trainer, starting from the DFT pre-trained model.
  • Figure 2: Energy vs. volume for the hcp (a), fcc (b), and bcc (c) titanium crystal structures computed for samples in the test DFT dataset. The predictions of the DFT pre-trained, DFT, EXP sequential, and DFT & EXP fused models are denoted with red, green, and blue points, respectively. DFT calculations are denoted with a black dashed line.
  • Figure 3: Bulk modulus (a), shear modulus (b), Poisson's ratio (c), and lattice constants $a$ (d) and $c$ (e) as a function of temperature for hcp titanium. The DFT pre-trained, DFT, EXP sequential, and DFT & EXP fused models are denoted with red, green, and blue line points, respectively. The experimental results are denoted with a black dashed line. Error bars denote the standard deviation computed via block-averaging with 10 blocks.
  • Figure 4: Phonon dispersion curves of hcp titanium for DFT pre-trained (a), DFT, EXP sequential (b), and DFT & EXP fused (c) models. The ML potential models' predictions match well the black dashed lines denoting the experimental prediction measured at 295 K stassis1979lattice.
  • Figure 5: Radial distribution function (RDF, a), angular distribution function (ADF, b), and velocity autocorrelation function (VACF, c) for the DFT pre-trained (red), DFT, EXP sequential (green), and DFT & EXP fused (blue) models. The RDFs and ADFs are computed at 1965 K to enable direct comparison with experimentally determined RDF (black, dashed) holland2007short. VACFs are evaluated at 2000 K to facilitate comparison to experiments for the self-diffusion coefficient.
  • ...and 1 more figures