Enhanced Diffusion Sampling: Efficient Rare Event Sampling and Free Energy Calculation with Diffusion Models
Yu Xie, Ludwig Winkler, Lixin Sun, Sarah Lewis, Adam E. Foster, José Jiménez Luna, Tim Hempel, Michael Gastegger, Yaoyi Chen, Iryna Zaporozhets, Cecilia Clementi, Christopher M. Bishop, Frank Noé
TL;DR
The paper tackles the dual challenge of slow mixing and rare-state estimation in molecular dynamics by marrying diffusion-model samplers with classical enhanced-sampling biasing and exact reweighting. It introduces three algorithms—UmbrellaDiff, ΔG-Diff, and MetaDiff—that steer pretrained diffusion models to biased ensembles and recover unbiased thermodynamics via MBAR/WHAM, enabling accurate folding free-energy calculations and rare-event statistics on GPU-scale runtimes. Key contributions include a general steering framework for diffusion models, novel implementations of umbrella sampling and metadynamics in this context, and demonstrated efficiency gains for protein folding systems using BioEmu. This work provides a practical pathway to routinely compute rare-state observables and free energies with diffusion-model samplers, broadening the applicability of data-driven equilibrium ensembles in biomolecular modeling.
Abstract
The rare-event sampling problem has long been the central limiting factor in molecular dynamics (MD), especially in biomolecular simulation. Recently, diffusion models such as BioEmu have emerged as powerful equilibrium samplers that generate independent samples from complex molecular distributions, eliminating the cost of sampling rare transition events. However, a sampling problem remains when computing observables that rely on states which are rare in equilibrium, for example folding free energies. Here, we introduce enhanced diffusion sampling, enabling efficient exploration of rare-event regions while preserving unbiased thermodynamic estimators. The key idea is to perform quantitatively accurate steering protocols to generate biased ensembles and subsequently recover equilibrium statistics via exact reweighting. We instantiate our framework in three algorithms: UmbrellaDiff (umbrella sampling with diffusion models), $Δ$G-Diff (free-energy differences via tilted ensembles), and MetaDiff (a batchwise analogue for metadynamics). Across toy systems, protein folding landscapes and folding free energies, our methods achieve fast, accurate, and scalable estimation of equilibrium properties within GPU-minutes to hours per system -- closing the rare-event sampling gap that remained after the advent of diffusion-model equilibrium samplers.
