Table of Contents
Fetching ...

Energy & Force Regression on DFT Trajectories is Not Enough for Universal Machine Learning Interatomic Potentials

Santiago Miret, Kin Long Kelvin Lee, Carmelo Gonzales, Sajid Mannan, N. M. Anoop Krishnan

TL;DR

This work argues that regressing energies and forces on DFT trajectories is insufficient for universal interatomic potentials intended for device-scale simulations. It outlines three pillars: (i) higher-accuracy training data such as CCSD(T) to replace or augment DFT labels, (ii) MLIP metrology combining large-scale benchmarking, visualization of energy landscapes, and interpretability analyses, and (iii) computationally efficient inference workflows suitable for MD at large scales. The authors detail limitations of DFT, data sparsity across materials, and the need to align simulations with experimental observables, advocating open-source data generation and differentiable simulation tooling. They conclude with a call for hardware-software co-design and data-centric MLIP development to realize quantum-accurate, device-scale predictions in real-world materials systems.

Abstract

Universal Machine Learning Interactomic Potentials (MLIPs) enable accelerated simulations for materials discovery. However, current research efforts fail to impactfully utilize MLIPs due to: 1. Overreliance on Density Functional Theory (DFT) for MLIP training data creation; 2. MLIPs' inability to reliably and accurately perform large-scale molecular dynamics (MD) simulations for diverse materials; 3. Limited understanding of MLIPs' underlying capabilities. To address these shortcomings, we aargue that MLIP research efforts should prioritize: 1. Employing more accurate simulation methods for large-scale MLIP training data creation (e.g. Coupled Cluster Theory) that cover a wide range of materials design spaces; 2. Creating MLIP metrology tools that leverage large-scale benchmarking, visualization, and interpretability analyses to provide a deeper understanding of MLIPs' inner workings; 3. Developing computationally efficient MLIPs to execute MD simulations that accurately model a broad set of materials properties. Together, these interdisciplinary research directions can help further the real-world application of MLIPs to accurately model complex materials at device scale.

Energy & Force Regression on DFT Trajectories is Not Enough for Universal Machine Learning Interatomic Potentials

TL;DR

This work argues that regressing energies and forces on DFT trajectories is insufficient for universal interatomic potentials intended for device-scale simulations. It outlines three pillars: (i) higher-accuracy training data such as CCSD(T) to replace or augment DFT labels, (ii) MLIP metrology combining large-scale benchmarking, visualization of energy landscapes, and interpretability analyses, and (iii) computationally efficient inference workflows suitable for MD at large scales. The authors detail limitations of DFT, data sparsity across materials, and the need to align simulations with experimental observables, advocating open-source data generation and differentiable simulation tooling. They conclude with a call for hardware-software co-design and data-centric MLIP development to realize quantum-accurate, device-scale predictions in real-world materials systems.

Abstract

Universal Machine Learning Interactomic Potentials (MLIPs) enable accelerated simulations for materials discovery. However, current research efforts fail to impactfully utilize MLIPs due to: 1. Overreliance on Density Functional Theory (DFT) for MLIP training data creation; 2. MLIPs' inability to reliably and accurately perform large-scale molecular dynamics (MD) simulations for diverse materials; 3. Limited understanding of MLIPs' underlying capabilities. To address these shortcomings, we aargue that MLIP research efforts should prioritize: 1. Employing more accurate simulation methods for large-scale MLIP training data creation (e.g. Coupled Cluster Theory) that cover a wide range of materials design spaces; 2. Creating MLIP metrology tools that leverage large-scale benchmarking, visualization, and interpretability analyses to provide a deeper understanding of MLIPs' inner workings; 3. Developing computationally efficient MLIPs to execute MD simulations that accurately model a broad set of materials properties. Together, these interdisciplinary research directions can help further the real-world application of MLIPs to accurately model complex materials at device scale.

Paper Structure

This paper contains 14 sections, 6 figures.

Figures (6)

  • Figure 1: Overview of Machine Learning Interatomic Potentials (MLIP) requirements for device scale modeling. Current research focuses mainly on bulk structures in ideal conditions with regression-based training and error metric evaluation. To enable materials foundation models, we require higher quality training datasets that use more accurate simulation methods like Coupled Cluster Theory. MLIPs, in turn, should be evaluated as part of atomistic simulations of real-world materials in application-informed conditions (e.g., defects, standard temperature & pressure, etc.). To reach MLIP-accelerated device scale modeling with quantum mechanical accuracy, we require new datasets and evaluation methods for complex materials systems in modern devices (e.g., 2D Material Transistors with multiple layers of distinct materials with specific functions), computational acceleration for proper inference, and comprehensive MLIP metrology.
  • Figure 2: An artistic interpretation of Jacob's ladder perdew2005prescription, extended to include wavefunction methods. Going up the ladders improves both accuracy and precision, but is generally proportional to increased time complexity. We stress that, while there are many nuances associated with method choice for atomistic systems, particularly with multireference methods like CASSCF roos1980complete, these accuracy trends are generally observed.
  • Figure 3: Frequency of elements in MPtrj dataset. The color bar in the figure represents a logarithmic scale ranging from low to high values, with the corresponding numbers indicating the frequency of each element's presence in the MPtraj dataset.
  • Figure 4: Schematic of length and time scales relevant to materials modeling. Annotations indicate approximate computational requirements; each block region corresponds to the amount of effort required for the corresponding scale of compute within a "timely" fashion. Routine and feasible refer to on the order of hours to days, while difficult can extend from weeks to months depending on the scale of distributed computing (e.g., the 2023 Gordon Bell prize submission by kozinsky2023scaling). "Unreasonable" means technically possible, but practically improbable due to to collective compute and engineering efforts required, and corresponds to the device scale of an Intel 8086--a microprocessor from the late 1970's--assuming a representative ${\sim}$30,000 atoms per transistor.
  • Figure 5: Training and inference times for different MLIP architectures on the LiPS dataset based on an analysis from bihani2024egraffbench. The MLIPs architectures include: MACE batatia2023foundation, BotNet batatia2025design, Allegro musaelian2023learning, Equiformer liao2023equiformer and NequiP batzner20223, all of which fail to achieve the inference time performance of the classical BKS potential van1990force.
  • ...and 1 more figures