Automated optimization of force field parameters against ensemble-averaged measurements with Bayesian Inference of Conformational Populations

Robert M. Raddi; Vincent A. Voelz

Automated optimization of force field parameters against ensemble-averaged measurements with Bayesian Inference of Conformational Populations

Robert M. Raddi, Vincent A. Voelz

TL;DR

The paper tackles the problem of parameterizing molecular force fields to reproduce ensemble-averaged experimental observables in the presence of forward-model and data uncertainty. It extends the Bayesian Inference of Conformational Populations (BICePs) framework to automated force-field refinement by variationally minimizing the score f(epsilon) while sampling the posterior over conformational populations X and uncertainty sigma, deriving first and second derivatives via MBAR. It introduces a robust Student's likelihood to down-weight outliers and demonstrates the approach on toy HP lattice and polymer models, including multi-parameter refinements and integration with PyTorch for simultaneous optimization of multiple epsilon parameters; results show accurate recovery of parameters and resilience to systematic errors. The work provides an open-source, scalable path toward robust parameterization of both physics-based and neural-network potentials using ensemble-averaged data, with broad applicability to transferable force fields and complex forward models.

Abstract

Accurate force fields are essential for reliable molecular simulations. These models are refined against quantum mechanical calculations and experimental measurements, which are subject to random and systematic errors. Bayesian Inference of Conformational Populations (BICePs) is a reweighting algorithm that reconciles simulated ensembles with sparse or noisy observables by sampling the full posterior distribution of conformational populations and experimental uncertainty. In this method, a metric called the BICePs score is used to perform model selection, by calculating the free energy of "turning on" the conformational populations under experimental restraints. This approach, when used with improved likelihood functions to deal with experimental outliers, can be used for force field validation (Raddi et al. 2025). Here, we extend the BICePs approach to perform automated force field refinement while simultaneously sampling the full distribution of uncertainties, using a variational method to minimize the BICePs score. To demonstrate the utility of this method, we refine multiple interaction parameters for a 12-mer HP lattice model using ensemble-averaged distance measurements as restraints. To illustrate the resilience of BICePs in the presence of unknown random and systematic errors, we assess the performance of our algorithm through repeated optimizations and under various extents of experimental error. Our results suggest that variational optimization of the BICePs score is a promising direction for robust and automatic parameterization of molecular potentials.

Automated optimization of force field parameters against ensemble-averaged measurements with Bayesian Inference of Conformational Populations

TL;DR

Abstract

Paper Structure (7 sections, 45 equations, 24 figures)

This paper contains 7 sections, 45 equations, 24 figures.

Introduction
Theory
Methods
Results and Discussion
Conclusion
Deriving the first and second derivatives of the BICePs score.
Kofinger & Hummer's toy polymer model

Figures (24)

Figure 1: Force field optimization by variationally minimizing the BICePs score. Given ensemble-averaged experimental observables $D$, and a simulated ensemble $p_{\lambda}(X|\epsilon)$ generated from molecular simulation using initial force field parameters $\epsilon$ , an automated procedure in used to find the optimal parameters $\epsilon^{*}$ that best match experiment. The procedure is guided by the first and second derivatives of the BICePs score at each iteration to propose new values of $\epsilon$. optimal parameters $\epsilon$. This cycle is iterated until convergence is reached.
Figure 2: (a) The folded state of the 12-mer HP lattice model protein with sequence HPHPHPHPPHPH. Highlighted in yellow are the favorable non-bonded contacts between the hydrophobic residues (black beads). (b) Extended diagram of the eight distance measurements used in this work.
Figure 3: (a) 1-D scans over $\epsilon$ to reveal the landscape of the BICePs score (blue dots). The green tangent lines are the computed derivatives at each $\epsilon$ value. Uncertainties in the BICePs scores and their derivatives come from five independent scans along $\epsilon$. (b) The derivative of the BICePs score (blue dots) and the second derivative of the BICePs score (yellow lines) at each epsilon value. The dotted black line at $\epsilon=1.0$ shows the true value, which is where the derivative of the BICePs score equals zero. BICePs calculations are run using the Student's model for 100k steps with 8 replicas.
Figure 4: Comparative analysis in performance of the Student's likelihood model (blue) and a Gaussian likelihood model (red), when random and systematic error of varying magnitude ($\sigma_{data}$) is introduced to the 2-11 and 0-11 distances. The vertical axis shows the derivative of the BICePs score evaluated at the "true" target value, $\epsilon=1$. This value corresponds to the "True" minima, illustrated in Figure \ref{['fig:score_landscape']}. The Gaussian likelihood's derivative becomes notably less dependable when data incorporates errors, especially when $\sigma_{data}$ exceeds 0.75. surpassing one standard deviation. The values shown were calculated using 5000 random perturbations to the 2-11 and 0-11 distances, and represent the average of 300 BICePs calculations. Error bars represent the standard deviation.
Figure 5: Average traces over a total of 25 independent rounds of parameter $(\epsilon_{2}, \epsilon_{4})$ optimizations using second-order (trust-ncg) method with BICePs, for a maximum of ten iterations. Optimizations converge to the same parameters ($\epsilon_{2}^{\text{True}}=1.25$, $\epsilon_{4}^{\text{True}}=1.5$) when starting from different initial parameters $(\epsilon_{2}^{0}, \epsilon_{4}^{0})$ = $\{(0.5, 5.0), (4.0, 5.0), (4.0, 0.5)\}$. Average BICePs optimized parameters were determined to be $\epsilon_2$ = 1.24 $\pm$ 0.41, and $\epsilon_4$ = 1.31 $\pm$ 0.33., where the uncertainties are esimated from the inverse Hessian. The BICePs score landscape was generated from the average values of five scans over $\epsilon_{2}$ and $\epsilon_{4}$. All calculations used the Student's model with 200k MCMC steps and 32 replicas. The experimental data is corrupted with systematic error in the 2-11 and 4-9 distances for +3 and +3.5Å shift, respectively. The total error in the data is $\sigma_{data} = 1.63$.
...and 19 more figures

Automated optimization of force field parameters against ensemble-averaged measurements with Bayesian Inference of Conformational Populations

TL;DR

Abstract

Automated optimization of force field parameters against ensemble-averaged measurements with Bayesian Inference of Conformational Populations

Authors

TL;DR

Abstract

Table of Contents

Figures (24)