Uncertainty Quantification and Propagation in Surrogate-based Bayesian Inference

Philipp Reiser; Javier Enrique Aguilar; Anneli Guthke; Paul-Christian Bürkner

Uncertainty Quantification and Propagation in Surrogate-based Bayesian Inference

Philipp Reiser, Javier Enrique Aguilar, Anneli Guthke, Paul-Christian Bürkner

TL;DR

This work tackles the challenge of uncertainty quantification and propagation in surrogate-based Bayesian inference for expensive simulators. It introduces a formal two-step framework (T-Step for surrogate training and I-Step for real-data inference) and a family of uncertainty propagation methods (Point, E-Post, E-Lik, E-Log-Lik) to coherently carry surrogate uncertainties into parameter inference. The study demonstrates that propagating both epistemic and aleatoric surrogate uncertainties via E-Post or E-Lik improves posterior calibration across linear, nonlinear (logistic), and epidemiological (SIR) models, with SBC-based validation supporting the reliability of the approach. The findings underscore the practical importance of full surrogate UP for safe and trustworthy decision-making in computationally demanding applications, and suggest avenues for scaling and refining the surrogate error model in future work.

Abstract

Surrogate models are statistical or conceptual approximations for more complex simulation models. In this context, it is crucial to propagate the uncertainty induced by limited simulation budget and surrogate approximation error to predictions, inference, and subsequent decision-relevant quantities. However, quantifying and then propagating the uncertainty of surrogates is usually limited to special analytic cases or is otherwise computationally very expensive. In this paper, we propose a framework enabling a scalable, Bayesian approach to surrogate modeling with thorough uncertainty quantification, propagation, and validation. Specifically, we present three methods for Bayesian inference with surrogate models given measurement data. This is a task where the propagation of surrogate uncertainty is especially relevant, because failing to account for it may lead to biased and/or overconfident estimates of the parameters of interest. We showcase our approach in three detailed case studies for linear and nonlinear real-world modeling scenarios. Uncertainty propagation in surrogate models enables more reliable and safe approximation of expensive simulators and will therefore be useful in various fields of applications.

Uncertainty Quantification and Propagation in Surrogate-based Bayesian Inference

TL;DR

Abstract

Paper Structure (64 sections, 74 equations, 15 figures, 2 tables)

This paper contains 64 sections, 74 equations, 15 figures, 2 tables.

Introduction
Method
Two-Step Procedure
First Step: Training the Surrogate (T-Step)
Simulator
Surrogate Model
Surrogate Approximation Error
Surrogate Training
Second Step: Inference on Real Data (I-Step)
Measurement Model
Inference
Uncertainty Propagation in the Two-Step Procedure
Point Estimate
Related work
Expected-Posterior
...and 49 more sections

Figures (15)

Figure 1: Overview of two-step procedure. Left: In the surrogate training step (T-Step), training data is generated using a simulator and a surrogate model is fitted which allows to estimate the T-posterior. Right: In the surrogate-based inference step (I-Step), measurements along with the T-posterior are used to infer the I-posterior.
Figure 2: Graphical model of the T-Step and I-Step. Left: In the T-Step, the observed quantities are simulation parameters $\omega_T$, simulation output $y_T$, and the noise hyperparameters $\sigma_S$. The unknowns are the surrogate parameters $\theta$. Right: In the I-Step, measurement data $y_I$ is observed $N_I$ times and $S$ posterior samples of $\theta$ are propagated from the T-Step. The dashed arrow indicates that uncertainty in $\theta$ is propagated to the I-Step while $\theta$ is not updated using the data $y_I$. The unknowns to be inferred are the simulation parameters $\omega_I$ and the measurement error hyperparameters $\sigma_I$.
Figure 3: I-posterior densities for the linear surrogate with normal priors/likelihoods in case study 1. We use the data and parameters as specified in Table \ref{['table:setup_linear_case_study']}. We use four different UP methods to compute the I-posterior while the surrogate approximation error $\sigma_A = \{0.1, 0.5, 1\}$ is varied.
Figure 4: Selected results for two-step procedure with the logistic surrogate in case study 2. Left: For $N_T = \{5, 7, 10\}$ the training data set $\mathcal{D}_T$ (black dots) and the mean of the T-posterior predictive distribution (red lines) is shown. Right: For each underlying true input $\omega_I^* \in \{-0.05, 0.1, 0.3\}$ (black vertical lines), we depict the I-posterior distributions for each Point, E-Lik, E-Post, and E-Log-Lik (colored lines).
Figure 5: Calibration and sharpness of the I-posteriors using logistic surrogate in case study 2. Top: ECDF difference plots for the I-posterior distributions of $\omega_I$ resulting from the four different methods. The blue areas in the ECDF difference plots indicate 95%-confidence envelopes and the black lines indicate the empirical cumulative distribution function (ECDF) for two different number of simulation points $N_T = \{5, 10\}$. Center: log-gamma-statistics of SBC with calibration threshold depicted as black horizontal line. Bottom: sharpness (90% CI) of I-posterior for four different I-Steps (colored dots/lines) for $N_T = \{5, 6, 7, 8, 9, 10\}$.
...and 10 more figures

Uncertainty Quantification and Propagation in Surrogate-based Bayesian Inference

TL;DR

Abstract

Uncertainty Quantification and Propagation in Surrogate-based Bayesian Inference

Authors

TL;DR

Abstract

Table of Contents

Figures (15)