Table of Contents
Fetching ...

BoostMD: Accelerating molecular sampling by leveraging ML force field features from previous time-steps

Lars L. Schaaf, Ilyes Batatia, Christoph Brunken, Thomas D. Barrett, Jules Tilly

TL;DR

BoostMD introduces a surrogate architecture that speeds up ML force-field-guided molecular dynamics by reusing node features from previous time steps to predict current energy changes. The method evaluates the expensive reference MLFF only every $N$ steps, while a lightweight, equivariant BoostMD model handles intermediate steps, delivering substantial speedups without sacrificing accuracy. It relies on a reference-framing scheme and equivariant message passing to maintain physical consistency and momentum conservation between boost steps. Empirical results on dipeptide systems show up to an $8\times$ speedup with robust generalization to unseen molecules and accurate Boltzmann-sampled sampling, suggesting BoostMD as a practical path toward long-timescale, high-accuracy MD simulations.

Abstract

Simulating atomic-scale processes, such as protein dynamics and catalytic reactions, is crucial for advancements in biology, chemistry, and materials science. Machine learning force fields (MLFFs) have emerged as powerful tools that achieve near quantum mechanical accuracy, with promising generalization capabilities. However, their practical use is often limited by long inference times compared to classical force fields, especially when running extensive molecular dynamics (MD) simulations required for many biological applications. In this study, we introduce BoostMD, a surrogate model architecture designed to accelerate MD simulations. BoostMD leverages node features computed at previous time steps to predict energies and forces based on positional changes. This approach reduces the complexity of the learning task, allowing BoostMD to be both smaller and significantly faster than conventional MLFFs. During simulations, the computationally intensive reference MLFF is evaluated only every $N$ steps, while the lightweight BoostMD model handles the intermediate steps at a fraction of the computational cost. Our experiments demonstrate that BoostMD achieves an eight-fold speedup compared to the reference model and generalizes to unseen dipeptides. Furthermore, we find that BoostMD accurately samples the ground-truth Boltzmann distribution when running molecular dynamics. By combining efficient feature reuse with a streamlined architecture, BoostMD offers a robust solution for conducting large-scale, long-timescale molecular simulations, making high-accuracy ML-driven modeling more accessible and practical.

BoostMD: Accelerating molecular sampling by leveraging ML force field features from previous time-steps

TL;DR

BoostMD introduces a surrogate architecture that speeds up ML force-field-guided molecular dynamics by reusing node features from previous time steps to predict current energy changes. The method evaluates the expensive reference MLFF only every steps, while a lightweight, equivariant BoostMD model handles intermediate steps, delivering substantial speedups without sacrificing accuracy. It relies on a reference-framing scheme and equivariant message passing to maintain physical consistency and momentum conservation between boost steps. Empirical results on dipeptide systems show up to an speedup with robust generalization to unseen molecules and accurate Boltzmann-sampled sampling, suggesting BoostMD as a practical path toward long-timescale, high-accuracy MD simulations.

Abstract

Simulating atomic-scale processes, such as protein dynamics and catalytic reactions, is crucial for advancements in biology, chemistry, and materials science. Machine learning force fields (MLFFs) have emerged as powerful tools that achieve near quantum mechanical accuracy, with promising generalization capabilities. However, their practical use is often limited by long inference times compared to classical force fields, especially when running extensive molecular dynamics (MD) simulations required for many biological applications. In this study, we introduce BoostMD, a surrogate model architecture designed to accelerate MD simulations. BoostMD leverages node features computed at previous time steps to predict energies and forces based on positional changes. This approach reduces the complexity of the learning task, allowing BoostMD to be both smaller and significantly faster than conventional MLFFs. During simulations, the computationally intensive reference MLFF is evaluated only every steps, while the lightweight BoostMD model handles the intermediate steps at a fraction of the computational cost. Our experiments demonstrate that BoostMD achieves an eight-fold speedup compared to the reference model and generalizes to unseen dipeptides. Furthermore, we find that BoostMD accurately samples the ground-truth Boltzmann distribution when running molecular dynamics. By combining efficient feature reuse with a streamlined architecture, BoostMD offers a robust solution for conducting large-scale, long-timescale molecular simulations, making high-accuracy ML-driven modeling more accessible and practical.

Paper Structure

This paper contains 29 sections, 11 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Properties of hidden features and the BostMD architecture. (a) Showing the node features of a foundation model for organic molecules (MACE-OFF23 kovacs_mace-off23_2023) as a function of MD steps. After 150 fs the molecule is artificially split in two. The hidden features oscillate during MD and only significantly change under this sever configurational change. (b) Proposed method of using node features from previous time steps to predict energy differences using BoostMD models (c).
  • Figure 2: BoostMD architecture and reference framing Showing the steps of BoostMD models, highlighting the reference framing step. The reference node features and reference positions are transformed to make BoostMD translationally and rotationally equivariant between steps as detailed in equation \ref{['eq:ref-frame-main']}. The figures show the receptive field of an atom $i$ with neighbors $j$, showing both the current (blue) and reference (red) positions. The central red arrow represents an equivariant reference feature of the atom $i$, while the black arrows show the vectors associated with the label underneath each image.
  • Figure 3: Free energy surface of unseen dipeptide. Comparison of the samples obtained by running met-dynamics using the ground truth MACE-OFF model and BoostMD. The free energy of the Ramachandran plot, is directly related to the marginalized Boltzmann distribution $~\exp{[-F(\phi, \psi)/k_BT]}$, where $\phi, \psi$ are the dihedral angles marked in rendered molecule (left) . During BoostMD, the reference model is evaluated every 10 steps. Both simulations are run for 5 ns ($5 \times 10^6$ steps).
  • Figure 4: Node feature properties Showing the fluctuations of the reference node features as a function of simulation time (a) and the change in node features as a function of change in position inside the atomic environment.