Table of Contents
Fetching ...

Diffusion Models are Molecular Dynamics Simulators

Justin Diamond, Markus Lill

TL;DR

This work reframes diffusion-based molecular sampling as a form of molecular dynamics by equipping denoising diffusion steps with a simple harmonic adapter that creates a quadratic coupling between consecutive states. The key result is an exact EM equivalence: each reverse-diffusion step with the adapter corresponds to one EM step for overdamped Langevin dynamics, with an implicit time step Δt = β/(2k) set by the spring. The authors derive a finite-schedule KL bound showing convergence to MD as the grid is refined and the score model improves, and they provide a continuous-time limit, a practical algorithm, and a path to time-parallel trajectory generation. Empirically, the approach yields MD-like trajectories and Boltzmann-consistent statistics from static configurations, enabling trajectory-level observables with potentially orders-of-magnitude speedups and flexible coupling to MCMC, metadynamics, and alchemical methods. Overall, this work paves the way for data-driven, scalable MD that preserves thermodynamics while leveraging diffusion-model training and parallelism.

Abstract

We prove that a denoising diffusion sampler equipped with a sequential bias across the batch dimension is exactly an Euler-Maruyama integrator for overdamped Langevin dynamics. Each reverse denoising step, with its associated spring stiffness, can be interpreted as one step of a stochastic differential equation with an effective time step set jointly by the noise schedule and that stiffness. The learned score then plays the role of the drift, equivalently the gradient of a learned energy, yielding a precise correspondence between diffusion sampling and Langevin time evolution. This equivalence recasts molecular dynamics (MD) in terms of diffusion models. Accuracy is no longer tied to a fixed, extremely small MD time step; instead, it is controlled by two scalable knobs: model capacity, which governs how well the drift is approximated, and the number of denoising steps, which sets the integrator resolution. In practice, this leads to a fully data-driven MD framework that learns forces from uncorrelated equilibrium snapshots, requires no hand-engineered force fields, uses no trajectory data for training, and still preserves the Boltzmann distribution associated with the learned energy. We derive trajectory-level, information-theoretic error bounds that cleanly separate discretization error from score-model error, clarify how temperature enters through the effective spring, and show that the resulting sampler generates molecular trajectories with MD-like temporal correlations, even though the model is trained only on static configurations.

Diffusion Models are Molecular Dynamics Simulators

TL;DR

This work reframes diffusion-based molecular sampling as a form of molecular dynamics by equipping denoising diffusion steps with a simple harmonic adapter that creates a quadratic coupling between consecutive states. The key result is an exact EM equivalence: each reverse-diffusion step with the adapter corresponds to one EM step for overdamped Langevin dynamics, with an implicit time step Δt = β/(2k) set by the spring. The authors derive a finite-schedule KL bound showing convergence to MD as the grid is refined and the score model improves, and they provide a continuous-time limit, a practical algorithm, and a path to time-parallel trajectory generation. Empirically, the approach yields MD-like trajectories and Boltzmann-consistent statistics from static configurations, enabling trajectory-level observables with potentially orders-of-magnitude speedups and flexible coupling to MCMC, metadynamics, and alchemical methods. Overall, this work paves the way for data-driven, scalable MD that preserves thermodynamics while leveraging diffusion-model training and parallelism.

Abstract

We prove that a denoising diffusion sampler equipped with a sequential bias across the batch dimension is exactly an Euler-Maruyama integrator for overdamped Langevin dynamics. Each reverse denoising step, with its associated spring stiffness, can be interpreted as one step of a stochastic differential equation with an effective time step set jointly by the noise schedule and that stiffness. The learned score then plays the role of the drift, equivalently the gradient of a learned energy, yielding a precise correspondence between diffusion sampling and Langevin time evolution. This equivalence recasts molecular dynamics (MD) in terms of diffusion models. Accuracy is no longer tied to a fixed, extremely small MD time step; instead, it is controlled by two scalable knobs: model capacity, which governs how well the drift is approximated, and the number of denoising steps, which sets the integrator resolution. In practice, this leads to a fully data-driven MD framework that learns forces from uncorrelated equilibrium snapshots, requires no hand-engineered force fields, uses no trajectory data for training, and still preserves the Boltzmann distribution associated with the learned energy. We derive trajectory-level, information-theoretic error bounds that cleanly separate discretization error from score-model error, clarify how temperature enters through the effective spring, and show that the resulting sampler generates molecular trajectories with MD-like temporal correlations, even though the model is trained only on static configurations.

Paper Structure

This paper contains 102 sections, 20 theorems, 99 equations, 17 figures, 1 algorithm.

Key Result

Theorem 1

With $\beta D=1$, there is a constant $C=C(D,L)$ depending on the Lipschitz constant $L$ of $\nabla V$ such that Consequently, by Pinsker, $\|\mathcal{L}(\tilde{X}_{[0,T]})-\mathcal{L}(X_{[0,T]})\|_{\mathrm{TV}} \le \tfrac12\sqrt{T\,\bar{\varepsilon}^2+C\sum_n\Delta t^2}.$

Figures (17)

  • Figure 1: Left: traditional i.i.d. diffusion sampling ignores temporal structure. Right: our harmonically‑coupled sampler recovers time‑correlated trajectories consistent with MD.
  • Figure 2: Radius‐of‐gyration traces for nine $\mathrm{C}_{\!13}$ hydrocarbon conformers: diffusion sampler (centre) versus OpenMM reference (right). Each plot is cropped by a molecule‐specific fraction to focus on the dynamic range of interest.
  • Figure 3: (continued) Remaining conformers (6–9). Diffusion and OpenMM agree well except for molecule 9, whose reference trajectory failed.
  • Figure 4: Batch–to–batch correlation map for the diffusion sampler. Each pixel shows the Pearson correlation between internal coordinates of two batch elements; warm colours indicate highly similar conformations. Insets highlight two structurally related states connected by a high-probability transition (black circle).
  • Figure 5: Top: quantitative agreement between energy spectra generated by the score-based diffusion model and classical OpenMM dynamics. Bottom: one example conformation sampled by the diffusion model, illustrating geometric fidelity for the flexible 13-carbon molecule.
  • ...and 12 more figures

Theorems & Definitions (30)

  • Theorem 1: Finite‑schedule pathwise KL bound
  • proof : Derivation sketch
  • Theorem 2: Diffusion $\Rightarrow$ MD in the fine‑grid/universal‑approximation limit
  • proof
  • Theorem 3: Finite‑schedule path KL: variance tempering adds no term
  • proof
  • Corollary 1: “No extra error” under our error model
  • Lemma 1: Noise fusion
  • proof
  • Theorem 4: Weak-$2$ consistency for the sum SDE
  • ...and 20 more