Physics-Informed Weakly Supervised Learning for Interatomic Potentials
Makoto Takamoto, Viktor Zaverkin, Mathias Niepert
TL;DR
This work tackles the challenge of training accurate and robust interatomic potentials when labeled data are sparse by introducing physics-informed weakly supervised learning (PIWSL). PIWSL couples a Taylor-expansion-based consistency loss (PITC) with a physics-inspired spatial consistency loss (PISC) to enforce physical relationships between energies and conservative forces, even with limited force labels. The approach demonstrates substantial improvements in energy and force predictions across multiple datasets and model architectures, enhanced MD stability, and notable gains in fine-tuning foundation models on highly accurate ab initio data. By leveraging approximate energy labels and perturbation-based consistency, PIWSL reduces data requirements while promoting physically plausible potential-energy surfaces, with broad implications for scalable, reliable atomistic simulations.
Abstract
Machine learning plays an increasingly important role in computational chemistry and materials science, complementing computationally intensive ab initio and first-principles methods. Despite their utility, machine-learning models often lack generalization capability and robustness during atomistic simulations, yielding unphysical energy and force predictions that hinder their real-world applications. We address this challenge by introducing a physics-informed, weakly supervised approach for training machine-learned interatomic potentials (MLIPs). We introduce two novel loss functions, extrapolating the potential energy via a Taylor expansion and using the concept of conservative forces. Our approach improves the accuracy of MLIPs applied to training tasks with sparse training data sets and reduces the need for pre-training computationally demanding models with large data sets. Particularly, we perform extensive experiments demonstrating reduced energy and force errors -- often lower by a factor of two -- for various baseline models and benchmark data sets. Moreover, we demonstrate improved robustness during MD simulations of the MLIP models trained with the proposed weakly supervised loss. Finally, our approach improves the fine-tuning of foundation models on sparse, highly accurate ab initio data. An implementation of our method and scripts for executing experiments are available at https://github.com/nec-research/PICPS-ML4Sci.
