Data-Efficient Multidimensional Free Energy Estimation via Physics-Informed Score Learning

Daniel Nagel; Tristan Bereau

Data-Efficient Multidimensional Free Energy Estimation via Physics-Informed Score Learning

Daniel Nagel, Tristan Bereau

TL;DR

This work addresses the challenge of estimating high-dimensional free-energy landscapes from molecular dynamics by extending Fokker-Planck Score Learning (FPSL) to two dimensions. It leverages a physics-informed prior based on the non-equilibrium steady state in periodic domains and enforces symmetries through Fourier features while using FP-based regularization to stabilize learning in sparsely sampled regions. The approach is validated on three systems—alanine dipeptide, a coarse-grained lipid bilayer permeation, and an all-atom ethanol permeation—showing accurate 2D free-energy surfaces with significantly less sampling than traditional umbrella sampling or ABF methods. By learning a smooth score function rather than histogram densities, FPSL avoids the exponential grid scaling and enables scalable multidimensional FEL estimation, with notable speedups and robustness across systems. These results suggest FPSL is a data-efficient, generalizable tool for characterizing complex biomolecular thermodynamics and kinetics in higher-dimensional CV spaces.

Abstract

Many biological processes involve numerous coupled degrees of freedom, yet free-energy estimation is often restricted to one-dimensional profiles to mitigate the high computational cost of multidimensional sampling. In this work, we extend Fokker--Planck Score Learning (FPSL) to efficiently reconstruct two-dimensional free-energy landscapes from non-equilibrium molecular dynamics simulations using different types of collective variables. We show that explicitly modeling orthogonal degrees of freedom reveals insights hidden in one-dimensional projections at negligible computational overhead. Additionally, exploiting symmetries in the underlying landscape enhances reconstruction accuracy, while regularization techniques ensure numerical robustness in sparsely sampled regions. We validate our approach on three distinct systems: the conformational dynamics of alanine dipeptide, as well as coarse-grained and all-atom models of solute permeation through lipid bilayers. We demonstrate that, because FPSL learns a smooth score function rather than histogram-based densities, it overcomes the exponential scaling of grid-based methods, establishing it as a data-efficient and scalable tool for multidimensional free-energy estimation.

Data-Efficient Multidimensional Free Energy Estimation via Physics-Informed Score Learning

TL;DR

Abstract

Paper Structure (23 sections, 26 equations, 7 figures)

This paper contains 23 sections, 26 equations, 7 figures.

Introduction
Theory and Methods
Background
Score-Based Diffusion Models on Periodic Domains:
Non-Equilibrium Steady State of a Periodic System:
Fokker--Planck Score Learning
Diffusion on Riemannian Manifolds
Extension to Two Dimensions
Training Objective and Regularization
Enforcing Symmetries via Fourier Features
Molecular Dynamics Simulations
Coarse-Grained Lipid Bilayer
All-Atom Lipid Bilayer
Alanine Dipeptide
Neural Network Architecture and Training
...and 8 more sections

Figures (7)

Figure 1: We consider a system with periodic boundary conditions, characterized by a conservative potential of mean force, $U(z, \theta)$. The center of mass of the composite particle (red and yellow) is driven by a constant external force $f$, while its orientation is subject to a constant torque $f_\theta$. The steady-state solution of the Fokker--Planck equation for a Brownian particle in a periodic potential, $p^\mathrm{ss}$, informs the score of our diffusion model, mapping the non-equilibrium steady state to the equilibrium distribution. The diffusion model smoothly interpolates between the physical non-equilibrium system at constant flux $J$ (left) and a trivial uniform prior (right). Denoising allows us to efficiently reconstruct the two-dimensional equilibrium free-energy landscape (bottom row) by exploiting the structure of $p^\mathrm{ss}$.
Figure 2: Free-energy estimation of alanine dipeptide in water: (a) Reference Ramachandran plot of the free-energy landscape as a function of the dihedral angles $\phi$ and $\psi$. (b) Depiction of alanine dipeptide. (c--h) Reconstructed free-energy landscapes from 23ns of non-equilibrium MD data, with (c, f) and without (d, g) sampling of the $\alpha_L$ region. The upper row corresponds to the simple energy regularization and the lower row to the Fokker--Planck regularization. (e+h) The standard deviation across 50 independent runs for the simple energy regularization and the Fokker--Planck regularization, respectively.
Figure 3: Performance evaluation of Fokker--Planck Score Learning (FPSL) for reconstructing the one-dimensional free-energy profile of alanine dipeptide along the $\psi$ dihedral angle. (a) Mean absolute error (MAE) as a function of the aggregate MD simulation time. The comparison includes: FPSL trained solely on $\psi$ (1D); FPSL trained on the joint space $(\phi, \psi)$ with biasing applied only along $\phi$ (2D); and FPSL trained on $(\phi, \psi)$ with biasing applied along both $\phi$ and $\psi$ (2D+). For the 2D approaches, the Fokker--Planck regularization scheme is employed, and the corresponding 1D profile is derived by marginalizing over $\phi$. (b--d) Comparison of the free-energy profiles reconstructed by FPSL using approximately 41ns of non-equilibrium MD data for the 1D (b), 2D (c), and 2D+ (d) methodologies.
Figure 4: Molecular systems studied: (a) Coarse-grained Martini 3 model of a C1P3 molecule permeating a POPC lipid bilayer. (b) All-atom model of an ethanol molecule permeating a POPC lipid bilayer. (c) Free-energy landscape of C1P3 permeating a POPC lipid bilayer, shown in panel (a), as a function of the normal distance $z$ from the bilayer midplane and $\cos\theta$, where $\theta$ is the polar angle of the molecule.
Figure 5: Performance evaluation of Fokker--Planck Score Learning (FPSL) for reconstructing the free-energy profile of C1P3 permeation through a POPC lipid bilayer. (a) Mean absolute error (MAE) as a function of aggregate MD simulation time. Comparison includes: FPSL trained on $z$ (1D); FPSL trained on $(z, \cos\theta)$ with biasing along $z$ (2D); FPSL trained on $(z, \cos\theta)$ with biasing along both $z$ and $\theta$ (2D+); and umbrella sampling with MBAR (purple). For the 2D methods, the 1D profile is obtained by marginalizing over $\cos\theta$. (b) Convergence improvement ($\Delta \text{MAE}$) achieved by learning the full 2D landscape (and subsequently marginalizing) compared to learning the 1D landscape directly. The yellow curve represents the 2D reconstruction using $\theta$ instead of $\cos\theta$. (c--e) Comparison of free-energy profiles reconstructed by FPSL using approximately $1µs$ of MD simulation data.
...and 2 more figures

Data-Efficient Multidimensional Free Energy Estimation via Physics-Informed Score Learning

TL;DR

Abstract

Data-Efficient Multidimensional Free Energy Estimation via Physics-Informed Score Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)