From Evaluation to Design: Using Potential Energy Surface Smoothness Metrics to Guide Machine Learning Interatomic Potential Architectures
Ryan Liu, Eric Qu, Tobias Kreiman, Samuel M. Blau, Aditi S. Krishnapriyan
TL;DR
The paper addresses the mismatch between MLIP regression accuracy and the true smoothness of quantum PES, which can destabilize MD simulations. It introduces BSCT, a computationally efficient benchmark that probes PES smoothness along controlled bond deformations and defines the Force Smoothness Deviation (FSD) as a fast proxy for MD stability. Through a neutral Transformer-like testbed (MinDScAIP), the authors demonstrate that targeted architectural refinements, including Diff-kNN, controllable Gaussian smearing, and temperature-controlled attention, reduce nonphysical PES features and improve both near- and far-from-equilibrium performance. BSCT is shown to be a practical in-the-loop design proxy that helps MLIP developers identify and mitigate physical challenges not captured by conventional benchmarks, with broader implications for reliable atomistic simulations. The work also provides evidence that combining physics-based evaluation with careful architecture design yields MLIPs that balance accuracy, stability, and scalability.
Abstract
Machine Learning Interatomic Potentials (MLIPs) sometimes fail to reproduce the physical smoothness of the quantum potential energy surface (PES), leading to erroneous behavior in downstream simulations that standard energy and force regression evaluations can miss. Existing evaluations, such as microcanonical molecular dynamics (MD), are computationally expensive and primarily probe near-equilibrium states. To improve evaluation metrics for MLIPs, we introduce the Bond Smoothness Characterization Test (BSCT). This efficient benchmark probes the PES via controlled bond deformations and detects non-smoothness, including discontinuities, artificial minima, and spurious forces, both near and far from equilibrium. We show that BSCT correlates strongly with MD stability while requiring a fraction of the cost of MD. To demonstrate how BSCT can guide iterative model design, we utilize an unconstrained Transformer backbone as a testbed, illustrating how refinements such as a new differentiable $k$-nearest neighbors algorithm and temperature-controlled attention reduce artifacts identified by our metric. By optimizing model design systematically based on BSCT, the resulting MLIP simultaneously achieves a low conventional E/F regression error, stable MD simulations, and robust atomistic property predictions. Our results establish BSCT as both a validation metric and as an "in-the-loop" model design proxy that alerts MLIP developers to physical challenges that cannot be efficiently evaluated by current MLIP benchmarks.
