Physically Interpretable Interatomic Potentials via Symbolic Regression and Reinforcement Learning
Bilvin Varughese, Troy D. Loeffler, Suvo Banik, Aditya Koneru, Sukriti Manna, Karthik Balasubramanian, Rohit Batra, Mathew J. Cherukara, Orcun Yildiz, Tom Peterka, Bobby G. Sumpter, Subramanian K. R. S. Sankaranarayanan
TL;DR
This work demonstrates physically interpretable interatomic potentials obtained by symbolic regression (SR) guided by reinforcement learning. By coupling equation learner networks (EqNN) with continuous-action Monte Carlo Tree Search (c-MCTS), the authors derive SR1 and SR2 Cu potentials that surpass Sutton-Chen SC-EAM in accuracy for energies, forces, equations of state, phonons, elastic constants, defect and surface energetics, and melting dynamics, while maintaining interpretability through explicit analytic expressions. The approach leverages DFT-derived training data, nested ensemble sampling, and a hybrid global–local optimization workflow to explore function space and converge on robust, transferable potentials. The resulting SR models achieve near-DFT fidelity at near-classical computation cost, enabling large-scale simulations with reliable thermomechanical transferability and insightful physical forms that illuminate underlying interatomic interactions. These findings suggest a practical pathway to design fast, accurate, and interpretable interatomic potentials across materials and thermodynamic conditions.
Abstract
The development of next-generation molecular simulation models requires moving beyond pre-defined functional forms toward machine learning (ML) techniques that directly capture multiscale physics. Here, we demonstrate such an approach using symbolic regression (SR) with equation learner networks and a reinforcement learning search engine to derive interpretable equations for interatomic interactions. Training data were generated through nested ensemble sampling with density functional theory (DFT) energetics, spanning crystalline to highly disordered states. The optimization of the learner network employed continuous-action Monte Carlo Tree Search (MCTS) combined with gradient descent, enabling efficient exploration of function space. For copper as a representative transition metal, an unconstrained search produced models that outperformed fixed-form Sutton-Chen EAM potentials. The SR-derived models (SR1 and SR2) reproduced key material properties - lattice constants, cohesive energies, equations of state, elastic constants, phonon dispersion, defect formation energies, surface/bulk energetics, and phase transformation with significantly improved accuracy. Furthermore, stringent melting simulations using two-phase solid-amorphous interfaces confirmed that SR models accurately capture the interplay of vibrational entropy, cohesive energy, and structural dynamics, surpassing SC-EAM in both qualitative and quantitative predictions. This highlights the potential of SR to deliver fast, accurate, flexible, and physically meaningful potentials, advancing predictive modeling across scales.
