Implicit Delta Learning of High Fidelity Neural Network Potentials
Stephan Thaler, Cristian Gabellini, Nikhil Shenoy, Prudencio Tossou
TL;DR
IDLe introduces an end-to-end multi-task framework that leverages cheaper low-fidelity QM data to implicitly learn high-fidelity neural network potentials, drastically reducing HF data requirements while preserving accuracy and inference costs. By sharing a latent representation across fidelity-specific heads, IDLe decodes $E^{HF}$ from $E^{LF}$ labels, enabling robust IID generalization and improved chemical coverage. Across multiple QM datasets and distribution shifts, IDLe achieves chemical accuracy with orders of magnitude fewer HF labels, and demonstrates substantial computational savings (up to ~25x HF data and CPU-time reductions) in both IID and out-of-distribution settings. The work also contributes a large LF-label dataset (~11 million points) to support future multi-fidelity NNP research, with broader implications for efficient MD simulations in materials science and drug discovery.
Abstract
Neural network potentials (NNPs) offer a fast and accurate alternative to ab-initio methods for molecular dynamics (MD) simulations but are hindered by the high cost of training data from high-fidelity Quantum Mechanics (QM) methods. Our work introduces the Implicit Delta Learning (IDLe) method, which reduces the need for high-fidelity QM data by leveraging cheaper semi-empirical QM computations without compromising NNP accuracy or inference cost. IDLe employs an end-to-end multi-task architecture with fidelity-specific heads that decode energies based on a shared latent representation of the input atomistic system. In various settings, IDLe achieves the same accuracy as single high-fidelity baselines while using up to 50x less high-fidelity data. This result could significantly reduce data generation cost and consequently enhance accuracy and generalization, and expand chemical coverage for NNPs, advancing MD simulations for material science and drug discovery. Additionally, we provide a novel set of 11 million semi-empirical QM calculations to support future multi-fidelity NNP modeling.
