Automatic Forward Model Parameterization with Bayesian Inference of Conformational Populations
Robert M. Raddi, Tim Marshall, Vincent A. Voelz
TL;DR
The paper addresses the challenge of aligning forward-model predictions with ensemble-averaged experimental data by parameterizing forward-model (FM) relations within a Bayesian framework (BICePs). It introduces two theoretically equivalent optimization routes: (i) posterior sampling of FM parameters inside the full posterior $p(X,\sigma,\theta|D)$ with replica-averaged observables, and (ii) variational minimization of the BICePs score $f(\theta) = -\ln Z(\theta)/Z_0$, where $Z(\theta) = \iint \exp(-u(\mathbf{X},\sigma;\theta))\,d\mathbf{X}d\mathbf{\sigma}$; gradients are obtained via MBAR and ensemble averages. Through toy-model tests and human ubiquitin data, the Good-and-Bad likelihood shows robust outlier handling, outperforming Gaussian and SVD approaches, and the parameter refinements exhibit transferability across priors such as 1D3Z, 2NR2, and RosettaFold2 (RF2) ensembles. Importantly, the framework extends to differentiable FM such as neural networks, demonstrated by a proof-of-concept NN trained with the BICePs loss, reinforcing its potential as a general, uncertainty-aware learning strategy for complex observables.
Abstract
To quantify how well theoretical predictions of structural ensembles agree with experimental measurements, we depend on the accuracy of forward models. These models are computational frameworks that generate observable quantities from molecular configurations based on empirical relationships linking specific molecular properties to experimental measurements. Bayesian Inference of Conformational Populations (BICePs) is a reweighting algorithm that reconciles simulated ensembles with ensemble-averaged experimental observations, even when such observations are sparse and/or noisy. This is achieved by sampling the posterior distribution of conformational populations under experimental restraints as well as sampling the posterior distribution of uncertainties due to random and systematic error. In this study, we enhance the algorithm for the refinement of empirical forward model (FM) parameters. We introduce and evaluate two novel methods for optimizing FM parameters. The first method treats FM parameters as nuisance parameters, integrating over them in the full posterior distribution. The second method employs variational minimization of a quantity called the BICePs score that reports the free energy of `turning on` the experimental restraints. This technique, coupled with improved likelihood functions for handling experimental outliers, facilitates force field validation and optimization, as illustrated in recent studies (Raddi et al. 2023, 2024). Using this approach, we refine parameters that modulate the Karplus relation, crucial for accurate predictions of J-coupling constants based on dihedral angles between interacting nuclei. We validate this approach first with a toy model system, and then for human ubiquitin, predicting six sets of Karplus parameters. Finally, we demonstrate that our framework naturally generalizes optimization to any differentiable forward model...
