Automatic Forward Model Parameterization with Bayesian Inference of Conformational Populations

Robert M. Raddi; Tim Marshall; Vincent A. Voelz

Automatic Forward Model Parameterization with Bayesian Inference of Conformational Populations

Robert M. Raddi, Tim Marshall, Vincent A. Voelz

TL;DR

The paper addresses the challenge of aligning forward-model predictions with ensemble-averaged experimental data by parameterizing forward-model (FM) relations within a Bayesian framework (BICePs). It introduces two theoretically equivalent optimization routes: (i) posterior sampling of FM parameters inside the full posterior $p(X,\sigma,\theta|D)$ with replica-averaged observables, and (ii) variational minimization of the BICePs score $f(\theta) = -\ln Z(\theta)/Z_0$, where $Z(\theta) = \iint \exp(-u(\mathbf{X},\sigma;\theta))\,d\mathbf{X}d\mathbf{\sigma}$; gradients are obtained via MBAR and ensemble averages. Through toy-model tests and human ubiquitin data, the Good-and-Bad likelihood shows robust outlier handling, outperforming Gaussian and SVD approaches, and the parameter refinements exhibit transferability across priors such as 1D3Z, 2NR2, and RosettaFold2 (RF2) ensembles. Importantly, the framework extends to differentiable FM such as neural networks, demonstrated by a proof-of-concept NN trained with the BICePs loss, reinforcing its potential as a general, uncertainty-aware learning strategy for complex observables.

Abstract

To quantify how well theoretical predictions of structural ensembles agree with experimental measurements, we depend on the accuracy of forward models. These models are computational frameworks that generate observable quantities from molecular configurations based on empirical relationships linking specific molecular properties to experimental measurements. Bayesian Inference of Conformational Populations (BICePs) is a reweighting algorithm that reconciles simulated ensembles with ensemble-averaged experimental observations, even when such observations are sparse and/or noisy. This is achieved by sampling the posterior distribution of conformational populations under experimental restraints as well as sampling the posterior distribution of uncertainties due to random and systematic error. In this study, we enhance the algorithm for the refinement of empirical forward model (FM) parameters. We introduce and evaluate two novel methods for optimizing FM parameters. The first method treats FM parameters as nuisance parameters, integrating over them in the full posterior distribution. The second method employs variational minimization of a quantity called the BICePs score that reports the free energy of `turning on` the experimental restraints. This technique, coupled with improved likelihood functions for handling experimental outliers, facilitates force field validation and optimization, as illustrated in recent studies (Raddi et al. 2023, 2024). Using this approach, we refine parameters that modulate the Karplus relation, crucial for accurate predictions of J-coupling constants based on dihedral angles between interacting nuclei. We validate this approach first with a toy model system, and then for human ubiquitin, predicting six sets of Karplus parameters. Finally, we demonstrate that our framework naturally generalizes optimization to any differentiable forward model...

Automatic Forward Model Parameterization with Bayesian Inference of Conformational Populations

TL;DR

with replica-averaged observables, and (ii) variational minimization of the BICePs score

, where

; gradients are obtained via MBAR and ensemble averages. Through toy-model tests and human ubiquitin data, the Good-and-Bad likelihood shows robust outlier handling, outperforming Gaussian and SVD approaches, and the parameter refinements exhibit transferability across priors such as 1D3Z, 2NR2, and RosettaFold2 (RF2) ensembles. Importantly, the framework extends to differentiable FM such as neural networks, demonstrated by a proof-of-concept NN trained with the BICePs loss, reinforcing its potential as a general, uncertainty-aware learning strategy for complex observables.

Abstract

Paper Structure (7 sections, 36 equations, 24 figures, 4 tables)

This paper contains 7 sections, 36 equations, 24 figures, 4 tables.

Introduction
Theory
Forward model optimization by posterior sampling of forward model parameters
Forward model optimization by variational minimization of the BICePs score
Results
Discussion
Conclusion

Figures (24)

Figure 1: A toy model for measuring the performance of forward model optimization. The $\phi$-angles for each conformational state is pulled from a multi-modal distribution and corresponding energies. (a) This multi-modal distribution of $\phi$-angles was intended to represent configurations with different secondary structure elements having three distinct modes described by the mean ($\mu$), standard deviation ($\sigma$) and weight ($w$): beta sheets ($\mu=-110^{\circ}$, $\sigma=20^{\circ}$, $w=0.35$), right-handed helices ($\mu=-60^{\circ}$, $\sigma=10^{\circ}$, $w=0.5$), and left-handed helices ($\mu=60^{\circ}$, $\sigma=5^{\circ}$, $w=0.15$). (b) Cartoon representation of the backbone torsion angle, $\phi$.
Figure 2: Comparative analysis in the performance of the Good-Bad likelihood model (red), a Gaussian likelihood model (blue), and singular value decomposition (SVD) using the "true" $\phi$ angles with synthetic experimental data. Here, we induced random and systematic error of varying magnitude ($\sigma_{\text{data}}$) to the experimental scalar couplings. Model performance was measured by computing RMSE (Hz) between the "true" scalar couplings and the couplings generated from the Karplus relations with predicted Karplus coefficients over 1,500 random perturbations to the experimental data, and represent the average of 100 BICePs calculations. Error bars represent the standard deviation. Predictions from SVD and the Gaussian likelihood model become notably less dependable when data incorporates errors, especially when $\sigma_{\text{data}}$ exceeds 0.5 Hz.
Figure 3: Karplus curves with BICePs-refined Karplus coefficients using the 1d3z ensemble for (a-c) ${^{3}\!J}_{C^{\prime}C^{\prime}}$, ${^{3}\!J}_{C^{\prime}C^{\beta}}$, and ${^{3}\!J}_{H^{\alpha} C^{\prime}}$. For comparison, SVD on 1ubq using experimental scalar coupling constants with $\phi$-angles derived from the X-ray structure (black dashed line), and red dots correspond to the fitted data points. Additionally, parameterizations from Bax et al. 1997 (green) and parameterization from Habeck et al. 2005 (yellow) were overlaid for comparison. The thickness of the line corresponds to the uncertainty.
Figure 4: Landscapes of the BICePs score with respect to the predicted Karplus coefficients for ${^{3}\!J}_{H^{N} C^{\prime}}$. Panels a, c and d illustrate the energy landscape $f$ for pairs of Karplus coefficients when using the 1D3Z structural ensemble during refinement.
Figure 5: Validation of BICePs-predicted Karplus coefficients perform similarly to Bax1997 and achieve minor improvements over Habeck2005 for scalar coupling predictions for the simulated ensemble of CHARMM22*. Each panel for (a) ${^{3}\!J}_{H^{\alpha} C^{\prime}}$, (b) ${^{3}\!J}_{C^{\prime}C^{\beta}}$, and (c) ${^{3}\!J}_{C^{\prime}C^{\prime}}$ shows strong correlations between predictions and experiment. Karplus coefficients derived from BICePs using the 2NR2 ensemble gives the best performance for CHARMM22*. For the remaining sets of $J$-coupling, please see Figure \ref{['fig:correlations_charmm22*']}.
...and 19 more figures

Automatic Forward Model Parameterization with Bayesian Inference of Conformational Populations

TL;DR

Abstract

Automatic Forward Model Parameterization with Bayesian Inference of Conformational Populations

Authors

TL;DR

Abstract

Table of Contents

Figures (24)