chemtrain: Learning Deep Potential Models via Automatic Differentiation and Statistical Physics
Paul Fuchs, Stephan Thaler, Sebastien Röcken, Julija Zavadlav
TL;DR
This work tackles data-efficient training of neural-network potentials for molecular dynamics by integrating top-down and bottom-up learning within a unified differentiable framework. The authors introduce chemtrain, a modular, JAX-based software that decomposes learning into reusable building blocks (FM, RM, DiffTRe, Delta-Learning) and a high-level API to compose training routines. They demonstrate the approach with two case studies: fusing experimental and simulation data to build a titanium atomistic model and combining FM with RM to train alanine dipeptide in implicit solvent, achieving improved accuracy and data efficiency. The framework emphasizes scalability, uncertainty handling, and practical deployment potential, with future directions including active learning, expanded observables, and tighter MD-code integration for large-scale simulations.
Abstract
Neural Networks (NNs) are effective models for refining the accuracy of molecular dynamics, opening up new fields of application. Typically trained bottom-up, atomistic NN potential models can reach first-principle accuracy, while coarse-grained implicit solvent NN potentials surpass classical continuum solvent models. However, overcoming the limitations of costly generation of accurate reference data and data inefficiency of common bottom-up training demands efficient incorporation of data from many sources. This paper introduces the framework chemtrain to learn sophisticated NN potential models through customizable training routines and advanced training algorithms. These routines can combine multiple top-down and bottom-up algorithms, e.g., to incorporate both experimental and simulation data or pre-train potentials with less costly algorithms. chemtrain provides an object-oriented high-level interface to simplify the creation of custom routines. On the lower level, chemtrain relies on JAX to compute gradients and scale the computations to use available resources. We demonstrate the simplicity and importance of combining multiple algorithms in the examples of parametrizing an all-atomistic model of titanium and a coarse-grained implicit solvent model of alanine dipeptide.
