Improving Reliability of Machine Learned Interatomic Potentials With Physics-Informed Pretraining

Qianyu Zheng; Victor Fung

Improving Reliability of Machine Learned Interatomic Potentials With Physics-Informed Pretraining

Qianyu Zheng, Victor Fung

TL;DR

This work presents a physics-informed pretraining strategy that leverages simple physical potentials which can improve the robustness and stability of graph-based MLIPs for MD simulations and finds that this physics-informed pretraining consistently improves both prediction accuracy as well as stability in MD compared to the baselines.

Abstract

Machine learned interatomic potentials (MLIPs) have emerged as powerful tools for molecular dynamics (MD) simulations with their competitive accuracy and computational efficiency. However, MLIPs are often observed to exhibit un-physical behavior when encountering configurations which deviate significantly from their training data distribution, leading to simulation instabilities and unreliable dynamics, thus limiting the reliability of MLIPs for materials simulations. We present a physics-informed pretraining strategy that leverages simple physical potentials which can improve the robustness and stability of graph-based MLIPs for MD simulations. We demonstrate this approach by deploying a pretraining-finetuning pipeline where MLIPs are initially pretrained on data labelled with embedded atom model potentials and subsequently finetuned on the quantum mechanical ground truth data. By evaluating across three diverse material systems (phosphorus, silica, and a subset of Materials Project) and three representative MLIP architectures (CGCNN, M3GNet, and TorchMD-NET), we find that this physics-informed pretraining consistently improves both prediction accuracy as well as stability in MD compared to the baselines.

Improving Reliability of Machine Learned Interatomic Potentials With Physics-Informed Pretraining

TL;DR

Abstract

Paper Structure (27 sections, 7 equations, 7 figures, 5 tables, 1 algorithm)

This paper contains 27 sections, 7 equations, 7 figures, 5 tables, 1 algorithm.

Introduction
Methodology
Machine Learned Interatomic Potentials
Datasets
Physics-Informed Pretraining Framework
Leveraging Physics-based Empirical Potentials
Implementation of the Empirical Potential
Effectiveness of the EAM Potential
Physics-Informed Pretraining Workflow
Benchmarking Suite for Trajectory Physicality
Molecular Dynamics Simulations
Metrics for MD Trajectory Physicality
Results
Experimental Setup
Physicality of MLIP-generated MD trajectories
...and 12 more sections

Figures (7)

Figure 1: Predicted energy landscapes from MLIP CGCNN and TorchMD, and the EAM potential for energy prediction of three types of interactions in the Silica dataset. Clear unphysical patterns are observed in both MLIPs below 1Å and above 6Å for CGCNN.
Figure 2: Decomposition of the trained EAM potential for modeling three types of di-atomic interaction in the Silica dataset with individual energy components as a function of interatomic distance. The plots display the contribution of electron density (orange solid line), Morse pairwise interaction (blue solid line), atomic energy (green solid line) and total EAM energy (black dashed line) for the three primary atom pair types in silica: (left) Si-O interactions, (center) O-O interactions, and (right) Si-Si interactions. All energies are shown relative to the cutoff radius of 8.0 Å.
Figure 3: EAM potential energy curves for key interatomic interactions across three benchmark datasets. The plots show the fitted EAM total energy as a function of interatomic distance for the most chemically significant atom pair interactions in each system: (left) Si-O, O-O and Si-Si interactions in the silica dataset, (center) P-P interactions in the phosphorus dataset, and (right) four representative interactions in the MPtrj subset.
Figure 4: Physics-informed pretraining workflow. The workflow begins with the construction of the pretraining dataset with the original quantum mechanical dataset. Through n-round perturbation following the two-step-perturbation algorithm, atomic positions are systematically modified to generate diverse structural configurations. The perturbed structures are then labeled using the trained EAM potential. The machine learning interatomic potential (MLIP) is first pretrained on this EAM-labeled dataset, then finetuned on the original quantum mechanical data through weight sharing.
Figure 5: Radial distribution function comparison between DFT molecular dynamics and MLIP predictions for four representative MPtrj Subset structures. The plots show RDFs for four atomic pair interactions (Mn-O, Li-O, Mn-P, O-O) across four selected structures: (a) mp-863414-0, (b) mp-25829-4, (c) mp-766128-4, and (d) mp-31980-3. Black dashed lines represent reference DFT MD trajectories, while colored lines show MLIP predictions using different training approaches: orange curves show our proposed pretraining method, blue curves show SAM optimization results, and green curves represent the baseline vanilla TorchMD model.
...and 2 more figures

Improving Reliability of Machine Learned Interatomic Potentials With Physics-Informed Pretraining

TL;DR

Abstract

Improving Reliability of Machine Learned Interatomic Potentials With Physics-Informed Pretraining

Authors

TL;DR

Abstract

Table of Contents

Figures (7)