Table of Contents
Fetching ...

Coarse-Grained Boltzmann Generators

Weilong Chen, Bojun Zhao, Jan Eckwert, Julija Zavadlav

TL;DR

Coarse-Grained Boltzmann Generators (CG-BGs) address the longstanding challenge of unbiased equilibrium sampling in large molecular systems by operating in coarse-grained coordinates and using a learned potential of mean force (PMF) for exact reweighting. The method combines a flow-based proposal with an energy target U_η(R) learned via Enhanced Sampling Force Matching (ESFM), enabling reweighting to the true Boltzmann distribution p(R) ∝ e^{−βU(R)} even when training data come from biased or rapidly converged simulations. By reducing dimensionality, CG-BGs achieve scalable sampling while preserving thermodynamic consistency, effectively capturing solvent-mediated and many-body effects in reduced representations. A simulation-free evaluation capability for learned PMFs allows rapid benchmarking of PMFs without running new MD simulations, and the approach demonstrates favorable accuracy–efficiency trade-offs across coarse-graining resolutions, including explicit solvent contexts.

Abstract

Sampling equilibrium molecular configurations from the Boltzmann distribution is a longstanding challenge. Boltzmann Generators (BGs) address this by combining exact-likelihood generative models with importance sampling, but their practical scalability is limited. Meanwhile, coarse-grained surrogates enable the modeling of larger systems by reducing effective dimensionality, yet often lack the reweighting process required to ensure asymptotically correct statistics. In this work, we propose Coarse-Grained Boltzmann Generators (CG-BGs), a principled framework that unifies scalable reduced-order modeling with the exactness of importance sampling. CG-BGs act in a coarse-grained coordinate space, using a learned potential of mean force (PMF) to reweight samples generated by a flow-based model. Crucially, we show that this PMF can be efficiently learned from rapidly converged data via force matching. Our results demonstrate that CG-BGs faithfully capture complex interactions mediated by explicit solvent within highly reduced representations, establishing a scalable pathway for the unbiased sampling of larger molecular systems.

Coarse-Grained Boltzmann Generators

TL;DR

Coarse-Grained Boltzmann Generators (CG-BGs) address the longstanding challenge of unbiased equilibrium sampling in large molecular systems by operating in coarse-grained coordinates and using a learned potential of mean force (PMF) for exact reweighting. The method combines a flow-based proposal with an energy target U_η(R) learned via Enhanced Sampling Force Matching (ESFM), enabling reweighting to the true Boltzmann distribution p(R) ∝ e^{−βU(R)} even when training data come from biased or rapidly converged simulations. By reducing dimensionality, CG-BGs achieve scalable sampling while preserving thermodynamic consistency, effectively capturing solvent-mediated and many-body effects in reduced representations. A simulation-free evaluation capability for learned PMFs allows rapid benchmarking of PMFs without running new MD simulations, and the approach demonstrates favorable accuracy–efficiency trade-offs across coarse-graining resolutions, including explicit solvent contexts.

Abstract

Sampling equilibrium molecular configurations from the Boltzmann distribution is a longstanding challenge. Boltzmann Generators (BGs) address this by combining exact-likelihood generative models with importance sampling, but their practical scalability is limited. Meanwhile, coarse-grained surrogates enable the modeling of larger systems by reducing effective dimensionality, yet often lack the reweighting process required to ensure asymptotically correct statistics. In this work, we propose Coarse-Grained Boltzmann Generators (CG-BGs), a principled framework that unifies scalable reduced-order modeling with the exactness of importance sampling. CG-BGs act in a coarse-grained coordinate space, using a learned potential of mean force (PMF) to reweight samples generated by a flow-based model. Crucially, we show that this PMF can be efficiently learned from rapidly converged data via force matching. Our results demonstrate that CG-BGs faithfully capture complex interactions mediated by explicit solvent within highly reduced representations, establishing a scalable pathway for the unbiased sampling of larger molecular systems.
Paper Structure (39 sections, 5 theorems, 36 equations, 14 figures, 6 tables, 3 algorithms)

This paper contains 39 sections, 5 theorems, 36 equations, 14 figures, 6 tables, 3 algorithms.

Key Result

Proposition 1

Let $p^*(\mathbf{R}) \propto e^{-\beta U^*(\mathbf{R})}$ be the true marginal and $p_\eta(\mathbf{R}) \propto e^{-\beta U_\eta(\mathbf{R})}$ the learned distribution. If $p^*$ satisfies a Logarithmic Sobolev Inequality (LSI) with constant $\rho > 0$. Then, the Kullback-Leibler divergence between the

Figures (14)

  • Figure 1: CG-BG workflow. (1) Training data are collected and mapped from atomistic configurations to CG beads. (2) A PMF network learns $U_\eta(\mathbf{R})$ from rapidly converged data, while a normalizing flow learns a proposal density $q_\theta(\mathbf{R})$. (3) CG samples from flow models are reweighted with the PMF to recover the target distribution $p(\mathbf{R})$ and compute unbiased thermodynamic observables.
  • Figure 2: CG-BGs on the MB potential. (a) Two-dimensional MB potential energy surface (functional form in §\ref{['datasets']}). (b) Marginal probability density along the $x$ coordinate. (c) Free energy profiles before and after reweighting for CG-BGs, where flow is trained on unbiased data, compared with the exact solution and MD reference. (d) Same as (c), but for flow trained on biased data.
  • Figure 3: CG-BGs on alanine dipeptide (Heavy Atom). (a) Heavy Atom mapping. (b) Potential energy distributions under the learned PMF before and after reweighting, compared with the MD reference. (c) $\phi$ dihedral free energy profile before and after reweighting for CG-BGs, where flow is trained on 500 ns unbiased data, alongside the MD reference. (d) Same as (c), but for flow trained on a 10 ns WT-MetaD ($\gamma=1.5$) dataset.
  • Figure 4: CG-BGs on alanine dipeptide (Core Beta). (a) Core Beta mapping. (b) Potential energy distributions under the learned PMF before and after reweighting, compared with the MD reference. (c) $\phi$ dihedral free energy profile before and after reweighting for CG-BGs, where flow is trained on 500 ns unbiased data, alongside the MD reference. (d) Same as (c), but for flow trained on a 10 ns WT-MetaD ($\gamma=1.5$) dataset .
  • Figure 5: Simulation-free benchmarking of learned CG PMFs using CG-BGs (Heavy Atom). (a) Probability density of the $\phi$ dihedral angle after reweighting with PMFs trained on unbiased ($\text{PMF}_U$) and rapidly converged biased datasets ($\text{PMF}_B$), compared with the MD reference and flow proposal (trained on unbiased data). (b) Corresponding $\phi$ dihedral free energy profiles after reweighting, alongside the MD reference and flow proposal.
  • ...and 9 more figures

Theorems & Definitions (7)

  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Proposition 3
  • proof
  • Proposition 3
  • proof