Table of Contents
Fetching ...

Enhancing Diffusion-Based Sampling with Molecular Collective Variables

Juno Nam, Bálint Máté, Artur P. Toshev, Manasa Kaniselvan, Rafael Gómez-Bombarelli, Ricky T. Q. Chen, Brandon Wood, Guan-Horng Liu, Benjamin Kurt Miller

TL;DR

The paper addresses the difficulty of sampling Boltzmann ensembles for molecular systems with diffusion-based samplers, which often miss low-population but thermodynamically important states. It introduces WT-ASBS, a Well-Tempered Adjoint Schrödinger Bridge Sampler that injects a well-tempered bias along selected collective variables via an online repulsive potential, enabling broader exploration while preserving Boltzmann-consistent reweighting. Key contributions include a convergence guarantee to the well-tempered target, a practical training recipe (pretraining, CV selection, restraints, and sampling/refinement), and demonstrations on Ala2, Ala4, SN2, and post-TS bifurcation tasks, achieving accurate PMFs and shorter wall-clock times than standard enhanced sampling baselines. The approach bridges diffusion-based sampling with classical enhanced-sampling ideas, extending practical diffusion samplers to complex molecular landscapes and reactive energy surfaces using both classical force fields and near-DFT accuracy interatomic potentials. Overall, WT-ASBS enables efficient, Boltzmann-consistent exploration of molecular configurations and reactive pathways, offering a scalable path toward broader adoption of diffusion-based samplers in molecular sciences.

Abstract

Diffusion-based samplers learn to sample complex, high-dimensional distributions using energies or log densities alone, without training data. Yet, they remain impractical for molecular sampling because they are often slower than molecular dynamics and miss thermodynamically relevant modes. Inspired by enhanced sampling, we encourage exploration by introducing a sequential bias along bespoke, information-rich, low-dimensional projections of atomic coordinates known as collective variables (CVs). We introduce a repulsive potential centered on the CVs from recent samples, which pushes future samples towards novel CV regions and effectively increases the temperature in the projected space. Our resulting method improves efficiency, mode discovery, enables the estimation of free energy differences, and retains independent sampling from the approximate Boltzmann distribution via reweighting by the bias. On standard peptide conformational sampling benchmarks, the method recovers diverse conformational states and accurate free energy profiles. We are the first to demonstrate reactive sampling using a diffusion-based sampler, capturing bond breaking and formation with universal interatomic potentials at near-first-principles accuracy. The approach resolves reactive energy landscapes at a fraction of the wall-clock time of standard sampling methods, advancing diffusion-based sampling towards practical use in molecular sciences.

Enhancing Diffusion-Based Sampling with Molecular Collective Variables

TL;DR

The paper addresses the difficulty of sampling Boltzmann ensembles for molecular systems with diffusion-based samplers, which often miss low-population but thermodynamically important states. It introduces WT-ASBS, a Well-Tempered Adjoint Schrödinger Bridge Sampler that injects a well-tempered bias along selected collective variables via an online repulsive potential, enabling broader exploration while preserving Boltzmann-consistent reweighting. Key contributions include a convergence guarantee to the well-tempered target, a practical training recipe (pretraining, CV selection, restraints, and sampling/refinement), and demonstrations on Ala2, Ala4, SN2, and post-TS bifurcation tasks, achieving accurate PMFs and shorter wall-clock times than standard enhanced sampling baselines. The approach bridges diffusion-based sampling with classical enhanced-sampling ideas, extending practical diffusion samplers to complex molecular landscapes and reactive energy surfaces using both classical force fields and near-DFT accuracy interatomic potentials. Overall, WT-ASBS enables efficient, Boltzmann-consistent exploration of molecular configurations and reactive pathways, offering a scalable path toward broader adoption of diffusion-based samplers in molecular sciences.

Abstract

Diffusion-based samplers learn to sample complex, high-dimensional distributions using energies or log densities alone, without training data. Yet, they remain impractical for molecular sampling because they are often slower than molecular dynamics and miss thermodynamically relevant modes. Inspired by enhanced sampling, we encourage exploration by introducing a sequential bias along bespoke, information-rich, low-dimensional projections of atomic coordinates known as collective variables (CVs). We introduce a repulsive potential centered on the CVs from recent samples, which pushes future samples towards novel CV regions and effectively increases the temperature in the projected space. Our resulting method improves efficiency, mode discovery, enables the estimation of free energy differences, and retains independent sampling from the approximate Boltzmann distribution via reweighting by the bias. On standard peptide conformational sampling benchmarks, the method recovers diverse conformational states and accurate free energy profiles. We are the first to demonstrate reactive sampling using a diffusion-based sampler, capturing bond breaking and formation with universal interatomic potentials at near-first-principles accuracy. The approach resolves reactive energy landscapes at a fraction of the wall-clock time of standard sampling methods, advancing diffusion-based sampling towards practical use in molecular sciences.

Paper Structure

This paper contains 80 sections, 2 theorems, 37 equations, 18 figures, 2 tables, 2 algorithms.

Key Result

Proposition 3.1

When alg:wt_asbs is performed until convergence, the bias potential $V_k$ converges almost surely to $V^\ast(s) = -(1 - \frac{1}{\gamma})F(s) + \mathrm{const}$, and hence the sampled distribution converges to the well-tempered target eq:wt_density_x.

Figures (18)

  • Figure 1: Scheme for neural enhanced sampling. (Left) Local sampling and pretraining near the reference configuration. (Middle) Energy-based sampler training with a CV-space bias that is updated online via well-tempered deposition. (Right) After convergence, the final bias yields the potential of mean force (PMF) along CVs; also, reweighting recovers the Boltzmann ensemble, including samples and free energy differences.
  • Figure 2: Sampling alanine dipeptide. (a) Molecular structure of alanine dipeptide and definition of the torsional collective variables $\phi$ and $\psi$. (b) Distribution of $\phi$ and $\psi$ obtained from a short unbiased MD trajectory used for pretraining the sampler. (c) Evolution of the PMF reconstructed from the accumulated bias during training, illustrating progressive recovery of the major basins in the free energy landscape. The dotted box delineates the two states, one with $\phi < 0$ or $\phi > 2$ and the other with $\phi \in [0, 2]$. (d) PMF estimated by reweighting $10^6$ samples generated by the trained sampler using the final bias. (e) Reference PMF computed from a long unbiased MD simulation for comparison. (f) Convergence of free energy differences $\Delta F$ between two states as a function of number of energy evaluations during training.
  • Figure 3: Sampling alanine tetrapeptide. (a) Molecular structure of alanine tetrapeptide with torsional collective variables $\phi_1$, $\phi_2$, and $\phi_3$. (b) Reference distribution in torsional space. Each torsion is partitioned into two regions, $\phi_i \in [0, 2]$ and $\phi_i < 0$ or $\phi_i > 2$, producing eight metastable states corresponding to all combinations of these regions across the three $\phi_i$. Pretraining is restricted to the single state indicated by $\color{red}\ast$, so the sampler initially encounters only one of the eight possible regions. (c) Accumulated number of explored states during the initial phase of training or simulation (up to 2M energy evaluations), showing how the sampler progressively discovers additional states as the bias is deposited. (d) Convergence of the free energies of the eight states, reported as the MAE up to an additive constant. For WT-ASBS, the curve extends up to 20M energy evaluations where the bias has converged.
  • Figure 4: Sampling reactive landscapes. (a) SN2 reaction scheme with nucleophilic substitution between states 1 and 2, along with the two bond distance CVs $d_1$ and $d_2$ that describe approach and departure of the chloride ions. (b) PMF $F(d_1, d_2)$ for the SN2 system, showing the reactant and product basins and the transition-state region along the two-dimensional distance coordinates. (c) Cycloaddition reaction scheme exhibiting a post-TS bifurcation, highlighting the competing pathways that originate from the same transition structure. (d) Definition of the three forming bond distances $d_1$, $d_2$, and $d_3$ used to parameterize progress along the bifurcating reaction pathways. (e) Contact CVs $c_i$ obtained from the distances $d_i$, used to construct smoother CVs for sampling. (f) One-dimensional PMF $F(s_1)$ along the CV $s_1 = c_1 + c_2 + c_3$, comparing WT-ASBS and WTMetaD. (g) Two-dimensional PMF $F(s_1, s_2)$ with $s_2 = c_2 - c_3$, resolving the post-TS bifurcation and showing the distinct product channels, with TS locations from saddle point optimization.
  • Figure 5: Convergence of PMFs. PMF MAE from the final converged PMF for each reaction and method, compared over wall-clock time.
  • ...and 13 more figures

Theorems & Definitions (3)

  • Proposition 3.1: Convergence of WT-ASBS
  • Remark 3.1
  • Proposition B.1: Convergence of WT-ASBS