Table of Contents
Fetching ...

Physically Interpretable Interatomic Potentials via Symbolic Regression and Reinforcement Learning

Bilvin Varughese, Troy D. Loeffler, Suvo Banik, Aditya Koneru, Sukriti Manna, Karthik Balasubramanian, Rohit Batra, Mathew J. Cherukara, Orcun Yildiz, Tom Peterka, Bobby G. Sumpter, Subramanian K. R. S. Sankaranarayanan

TL;DR

This work demonstrates physically interpretable interatomic potentials obtained by symbolic regression (SR) guided by reinforcement learning. By coupling equation learner networks (EqNN) with continuous-action Monte Carlo Tree Search (c-MCTS), the authors derive SR1 and SR2 Cu potentials that surpass Sutton-Chen SC-EAM in accuracy for energies, forces, equations of state, phonons, elastic constants, defect and surface energetics, and melting dynamics, while maintaining interpretability through explicit analytic expressions. The approach leverages DFT-derived training data, nested ensemble sampling, and a hybrid global–local optimization workflow to explore function space and converge on robust, transferable potentials. The resulting SR models achieve near-DFT fidelity at near-classical computation cost, enabling large-scale simulations with reliable thermomechanical transferability and insightful physical forms that illuminate underlying interatomic interactions. These findings suggest a practical pathway to design fast, accurate, and interpretable interatomic potentials across materials and thermodynamic conditions.

Abstract

The development of next-generation molecular simulation models requires moving beyond pre-defined functional forms toward machine learning (ML) techniques that directly capture multiscale physics. Here, we demonstrate such an approach using symbolic regression (SR) with equation learner networks and a reinforcement learning search engine to derive interpretable equations for interatomic interactions. Training data were generated through nested ensemble sampling with density functional theory (DFT) energetics, spanning crystalline to highly disordered states. The optimization of the learner network employed continuous-action Monte Carlo Tree Search (MCTS) combined with gradient descent, enabling efficient exploration of function space. For copper as a representative transition metal, an unconstrained search produced models that outperformed fixed-form Sutton-Chen EAM potentials. The SR-derived models (SR1 and SR2) reproduced key material properties - lattice constants, cohesive energies, equations of state, elastic constants, phonon dispersion, defect formation energies, surface/bulk energetics, and phase transformation with significantly improved accuracy. Furthermore, stringent melting simulations using two-phase solid-amorphous interfaces confirmed that SR models accurately capture the interplay of vibrational entropy, cohesive energy, and structural dynamics, surpassing SC-EAM in both qualitative and quantitative predictions. This highlights the potential of SR to deliver fast, accurate, flexible, and physically meaningful potentials, advancing predictive modeling across scales.

Physically Interpretable Interatomic Potentials via Symbolic Regression and Reinforcement Learning

TL;DR

This work demonstrates physically interpretable interatomic potentials obtained by symbolic regression (SR) guided by reinforcement learning. By coupling equation learner networks (EqNN) with continuous-action Monte Carlo Tree Search (c-MCTS), the authors derive SR1 and SR2 Cu potentials that surpass Sutton-Chen SC-EAM in accuracy for energies, forces, equations of state, phonons, elastic constants, defect and surface energetics, and melting dynamics, while maintaining interpretability through explicit analytic expressions. The approach leverages DFT-derived training data, nested ensemble sampling, and a hybrid global–local optimization workflow to explore function space and converge on robust, transferable potentials. The resulting SR models achieve near-DFT fidelity at near-classical computation cost, enabling large-scale simulations with reliable thermomechanical transferability and insightful physical forms that illuminate underlying interatomic interactions. These findings suggest a practical pathway to design fast, accurate, and interpretable interatomic potentials across materials and thermodynamic conditions.

Abstract

The development of next-generation molecular simulation models requires moving beyond pre-defined functional forms toward machine learning (ML) techniques that directly capture multiscale physics. Here, we demonstrate such an approach using symbolic regression (SR) with equation learner networks and a reinforcement learning search engine to derive interpretable equations for interatomic interactions. Training data were generated through nested ensemble sampling with density functional theory (DFT) energetics, spanning crystalline to highly disordered states. The optimization of the learner network employed continuous-action Monte Carlo Tree Search (MCTS) combined with gradient descent, enabling efficient exploration of function space. For copper as a representative transition metal, an unconstrained search produced models that outperformed fixed-form Sutton-Chen EAM potentials. The SR-derived models (SR1 and SR2) reproduced key material properties - lattice constants, cohesive energies, equations of state, elastic constants, phonon dispersion, defect formation energies, surface/bulk energetics, and phase transformation with significantly improved accuracy. Furthermore, stringent melting simulations using two-phase solid-amorphous interfaces confirmed that SR models accurately capture the interplay of vibrational entropy, cohesive energy, and structural dynamics, surpassing SC-EAM in both qualitative and quantitative predictions. This highlights the potential of SR to deliver fast, accurate, flexible, and physically meaningful potentials, advancing predictive modeling across scales.

Paper Structure

This paper contains 12 sections, 10 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: The figure captures the training process for learning the Sutton-Chen Equation using EqNN a) The energy and force correlations showing very high correlations between the reference energy and force value calculated using SC EAM vs the ones predicted using EqNN b) The comparison between the pair term and the embedding term for the target equation and the one predicted by EqNN where for all relevant distances. c) The comparison of the Reference vs the predicted equations using the EqNN
  • Figure 2: Comparison of SC EAM predictions with DFT reference data for structures used to train the EqNN corresponding to the SC Equation. (a) Energy and (b) force predictions are shown across configurations ranging from near-ground state (left) to far-from-ground state (right). As the configurations deviate from equilibrium, both energy and force prediction errors (MAE) increase significantly.
  • Figure 3: Comparison of SR1 EAM predictions with DFT reference data for the dataset used to train the EqNN, using the same fixed embedding as the SC-EAM formalism. (a) Energy and (b) force predictions are shown across configurations ranging from near-ground state (left) to far-from-ground state (right). SR1 EAM exhibits significantly improved correlation with DFT energies and forces across all regimes compared to SC-EAM.
  • Figure 4: (a) Uniaxial equation of state (EOS) under compressive/tensile strain: SR1 (red) closely follows reference DFT (Ref; gray/black) across the full strain window, with a mean absolute error (MAE) of $5.26\,\mathrm{meV/atom}$. (b) Volumetric EOS: SR1 maintains strong correlation with DFT (MAE $=2.89\,\mathrm{meV/atom}$). (c) Shear/tetragonal deformation: SR1 again agrees well with DFT (MAE $=5.4\,\mathrm{meV/atom}$).Insets in (a–c) depict the applied deformation modes. (d) Phonon dispersion along $\Gamma\!-\!X\!-\!U\!-\!K\!-\!\Gamma\!-\!L\!-\!W\!-\!X$: SR1 reproduces the DFT branches with good agreement, particularly the acoustic slopes near $\Gamma$; no imaginary modes are observed and a slight under-prediction of frequencies is noted. All energies are reported as $\Delta E$ per atom referenced to unstrained fcc.
  • Figure 5: Comparison of SR2 EAM predictions with DFT reference data for the dataset used to train the EqNN, using the embedding function having more terms in addition to the square root term in the basis set. (a) Energy and (b) force predictions are shown across configurations ranging from near-ground state (left) to far-from-ground state (right). From left to right, we can see an increase in the range of energies where these structures fall, and we see a higher correlation in both energy and forces for all energy ranges compared with the predictions made by SR1.
  • ...and 7 more figures