Enhanced Sampling for Efficient Learning of Coarse-Grained Machine Learning Potentials
Weilong Chen, Franz Görlich, Paul Fuchs, Julija Zavadlav
TL;DR
This work tackles the data inefficiency and poor transition-region sampling in learning coarse-grained machine learning potentials via force matching. It proves that mean forces are invariant under CG-coordinate bias when forces are recomputed with the unbiased potential, enabling biased data to be used without reweighting. By employing umbrella sampling and well-tempered metadynamics, the authors accelerate data generation and enrich transition-region sampling, demonstrated on Müller–Brown and capped alanine in water. The approach yields more accurate and stable CG PMFs without incorporating physics priors, highlighting enhanced sampling as a practical framework for data-efficient CG modeling and its potential for broader application.
Abstract
Coarse-graining (CG) enables molecular dynamics (MD) simulations of larger systems and longer timescales that are otherwise infeasible with atomistic models. Machine learning potentials (MLPs), with their capacity to capture many-body interactions, can provide accurate approximations of the potential of mean force (PMF) in CG models. Current CG MLPs are typically trained in a bottom-up manner via force matching, which in practice relies on configurations sampled from the unbiased equilibrium Boltzmann distribution to ensure thermodynamic consistency. This convention poses two key limitations: first, sufficiently long atomistic trajectories are needed to reach convergence; and second, even once equilibrated, transition regions remain poorly sampled. To address these issues, we employ enhanced sampling to bias along CG degrees of freedom for data generation, and then recompute the forces with respect to the unbiased potential. This strategy simultaneously shortens the simulation time required to produce equilibrated data and enriches sampling in transition regions, while preserving the correct PMF. We demonstrate its effectiveness on the Müller-Brown potential and capped alanine, achieving notable improvements. Our findings support the use of enhanced sampling for force matching as a promising direction to improve the accuracy and reliability of CG MLPs.
