Data-Efficient Multidimensional Free Energy Estimation via Physics-Informed Score Learning
Daniel Nagel, Tristan Bereau
TL;DR
This work addresses the challenge of estimating high-dimensional free-energy landscapes from molecular dynamics by extending Fokker-Planck Score Learning (FPSL) to two dimensions. It leverages a physics-informed prior based on the non-equilibrium steady state in periodic domains and enforces symmetries through Fourier features while using FP-based regularization to stabilize learning in sparsely sampled regions. The approach is validated on three systems—alanine dipeptide, a coarse-grained lipid bilayer permeation, and an all-atom ethanol permeation—showing accurate 2D free-energy surfaces with significantly less sampling than traditional umbrella sampling or ABF methods. By learning a smooth score function rather than histogram densities, FPSL avoids the exponential grid scaling and enables scalable multidimensional FEL estimation, with notable speedups and robustness across systems. These results suggest FPSL is a data-efficient, generalizable tool for characterizing complex biomolecular thermodynamics and kinetics in higher-dimensional CV spaces.
Abstract
Many biological processes involve numerous coupled degrees of freedom, yet free-energy estimation is often restricted to one-dimensional profiles to mitigate the high computational cost of multidimensional sampling. In this work, we extend Fokker--Planck Score Learning (FPSL) to efficiently reconstruct two-dimensional free-energy landscapes from non-equilibrium molecular dynamics simulations using different types of collective variables. We show that explicitly modeling orthogonal degrees of freedom reveals insights hidden in one-dimensional projections at negligible computational overhead. Additionally, exploiting symmetries in the underlying landscape enhances reconstruction accuracy, while regularization techniques ensure numerical robustness in sparsely sampled regions. We validate our approach on three distinct systems: the conformational dynamics of alanine dipeptide, as well as coarse-grained and all-atom models of solute permeation through lipid bilayers. We demonstrate that, because FPSL learns a smooth score function rather than histogram-based densities, it overcomes the exponential scaling of grid-based methods, establishing it as a data-efficient and scalable tool for multidimensional free-energy estimation.
