Bayesian learning for accurate and robust biomolecular force fields
Vojtech Kostal, Brennon L. Shanks, Pavel Jungwirth, Hector Martinez-Seara
TL;DR
The paper tackles the challenge of parameterizing biomolecular force fields with quantified uncertainty by introducing a Bayesian framework that learns partial charges from ab initio MD data in explicit solvent. It couples this with a computationally efficient Local Gaussian Process surrogate to enable likelihood evaluation during Bayesian inference, enabling robust, transferable parameter estimates across diverse molecular fragments. The authors demonstrate improved agreement with high-level references and experimental observables, achieving subpercent accuracy for densities and reasonable accuracy for solvation and binding properties, while providing a principled uncertainty quantification through posterior distributions. They further validate transferability by applying fragment-derived charges to a calcium-binding problem in cardiac troponin C, showing close alignment with experimental binding free energies and highlighting the method’s potential to bridge electronic-structure accuracy with classical-scale simulations. The approach is positioned as a general, open framework for uncertainty-aware force-field development that can integrate diverse data sources and scale with advances in GPU-accelerated inference, although challenges remain for very high-dimensional parameter spaces and representative training data.
Abstract
Molecular dynamics is a valuable tool to probe biological processes at the atomistic level - a resolution often elusive to experiments. However, the credibility of molecular models is limited by the accuracy of the underlying force field, which is often parametrized relying on ad hoc assumptions. To address this gap, we present a Bayesian framework for learning physically grounded parameters directly from ab initio molecular dynamics data. By representing both model parameters and data probabilistically, the framework yields interpretable, statistically rigorous models in which uncertainty and transferability emerge naturally from the learning process. This approach provides a transparent, data-driven foundation for developing predictive molecular models and enhances confidence in computational descriptions of biophysical systems. We demonstrate the method using 18 biologically relevant molecular fragments that capture key motifs in proteins, nucleic acids, and lipids, and, as a proof of concept, apply it to calcium binding to troponin - a central event in cardiac regulation.
