Energy & Force Regression on DFT Trajectories is Not Enough for Universal Machine Learning Interatomic Potentials
Santiago Miret, Kin Long Kelvin Lee, Carmelo Gonzales, Sajid Mannan, N. M. Anoop Krishnan
TL;DR
This work argues that regressing energies and forces on DFT trajectories is insufficient for universal interatomic potentials intended for device-scale simulations. It outlines three pillars: (i) higher-accuracy training data such as CCSD(T) to replace or augment DFT labels, (ii) MLIP metrology combining large-scale benchmarking, visualization of energy landscapes, and interpretability analyses, and (iii) computationally efficient inference workflows suitable for MD at large scales. The authors detail limitations of DFT, data sparsity across materials, and the need to align simulations with experimental observables, advocating open-source data generation and differentiable simulation tooling. They conclude with a call for hardware-software co-design and data-centric MLIP development to realize quantum-accurate, device-scale predictions in real-world materials systems.
Abstract
Universal Machine Learning Interactomic Potentials (MLIPs) enable accelerated simulations for materials discovery. However, current research efforts fail to impactfully utilize MLIPs due to: 1. Overreliance on Density Functional Theory (DFT) for MLIP training data creation; 2. MLIPs' inability to reliably and accurately perform large-scale molecular dynamics (MD) simulations for diverse materials; 3. Limited understanding of MLIPs' underlying capabilities. To address these shortcomings, we aargue that MLIP research efforts should prioritize: 1. Employing more accurate simulation methods for large-scale MLIP training data creation (e.g. Coupled Cluster Theory) that cover a wide range of materials design spaces; 2. Creating MLIP metrology tools that leverage large-scale benchmarking, visualization, and interpretability analyses to provide a deeper understanding of MLIPs' inner workings; 3. Developing computationally efficient MLIPs to execute MD simulations that accurately model a broad set of materials properties. Together, these interdisciplinary research directions can help further the real-world application of MLIPs to accurately model complex materials at device scale.
