Table of Contents
Fetching ...

Uncertainty Quantification and Propagation in Surrogate-based Bayesian Inference

Philipp Reiser, Javier Enrique Aguilar, Anneli Guthke, Paul-Christian Bürkner

TL;DR

This work tackles the challenge of uncertainty quantification and propagation in surrogate-based Bayesian inference for expensive simulators. It introduces a formal two-step framework (T-Step for surrogate training and I-Step for real-data inference) and a family of uncertainty propagation methods (Point, E-Post, E-Lik, E-Log-Lik) to coherently carry surrogate uncertainties into parameter inference. The study demonstrates that propagating both epistemic and aleatoric surrogate uncertainties via E-Post or E-Lik improves posterior calibration across linear, nonlinear (logistic), and epidemiological (SIR) models, with SBC-based validation supporting the reliability of the approach. The findings underscore the practical importance of full surrogate UP for safe and trustworthy decision-making in computationally demanding applications, and suggest avenues for scaling and refining the surrogate error model in future work.

Abstract

Surrogate models are statistical or conceptual approximations for more complex simulation models. In this context, it is crucial to propagate the uncertainty induced by limited simulation budget and surrogate approximation error to predictions, inference, and subsequent decision-relevant quantities. However, quantifying and then propagating the uncertainty of surrogates is usually limited to special analytic cases or is otherwise computationally very expensive. In this paper, we propose a framework enabling a scalable, Bayesian approach to surrogate modeling with thorough uncertainty quantification, propagation, and validation. Specifically, we present three methods for Bayesian inference with surrogate models given measurement data. This is a task where the propagation of surrogate uncertainty is especially relevant, because failing to account for it may lead to biased and/or overconfident estimates of the parameters of interest. We showcase our approach in three detailed case studies for linear and nonlinear real-world modeling scenarios. Uncertainty propagation in surrogate models enables more reliable and safe approximation of expensive simulators and will therefore be useful in various fields of applications.

Uncertainty Quantification and Propagation in Surrogate-based Bayesian Inference

TL;DR

This work tackles the challenge of uncertainty quantification and propagation in surrogate-based Bayesian inference for expensive simulators. It introduces a formal two-step framework (T-Step for surrogate training and I-Step for real-data inference) and a family of uncertainty propagation methods (Point, E-Post, E-Lik, E-Log-Lik) to coherently carry surrogate uncertainties into parameter inference. The study demonstrates that propagating both epistemic and aleatoric surrogate uncertainties via E-Post or E-Lik improves posterior calibration across linear, nonlinear (logistic), and epidemiological (SIR) models, with SBC-based validation supporting the reliability of the approach. The findings underscore the practical importance of full surrogate UP for safe and trustworthy decision-making in computationally demanding applications, and suggest avenues for scaling and refining the surrogate error model in future work.

Abstract

Surrogate models are statistical or conceptual approximations for more complex simulation models. In this context, it is crucial to propagate the uncertainty induced by limited simulation budget and surrogate approximation error to predictions, inference, and subsequent decision-relevant quantities. However, quantifying and then propagating the uncertainty of surrogates is usually limited to special analytic cases or is otherwise computationally very expensive. In this paper, we propose a framework enabling a scalable, Bayesian approach to surrogate modeling with thorough uncertainty quantification, propagation, and validation. Specifically, we present three methods for Bayesian inference with surrogate models given measurement data. This is a task where the propagation of surrogate uncertainty is especially relevant, because failing to account for it may lead to biased and/or overconfident estimates of the parameters of interest. We showcase our approach in three detailed case studies for linear and nonlinear real-world modeling scenarios. Uncertainty propagation in surrogate models enables more reliable and safe approximation of expensive simulators and will therefore be useful in various fields of applications.
Paper Structure (64 sections, 74 equations, 15 figures, 2 tables)

This paper contains 64 sections, 74 equations, 15 figures, 2 tables.

Figures (15)

  • Figure 1: Overview of two-step procedure. Left: In the surrogate training step (T-Step), training data is generated using a simulator and a surrogate model is fitted which allows to estimate the T-posterior. Right: In the surrogate-based inference step (I-Step), measurements along with the T-posterior are used to infer the I-posterior.
  • Figure 2: Graphical model of the T-Step and I-Step. Left: In the T-Step, the observed quantities are simulation parameters $\omega_T$, simulation output $y_T$, and the noise hyperparameters $\sigma_S$. The unknowns are the surrogate parameters $\theta$. Right: In the I-Step, measurement data $y_I$ is observed $N_I$ times and $S$ posterior samples of $\theta$ are propagated from the T-Step. The dashed arrow indicates that uncertainty in $\theta$ is propagated to the I-Step while $\theta$ is not updated using the data $y_I$. The unknowns to be inferred are the simulation parameters $\omega_I$ and the measurement error hyperparameters $\sigma_I$.
  • Figure 3: I-posterior densities for the linear surrogate with normal priors/likelihoods in case study 1. We use the data and parameters as specified in Table \ref{['table:setup_linear_case_study']}. We use four different UP methods to compute the I-posterior while the surrogate approximation error $\sigma_A = \{0.1, 0.5, 1\}$ is varied.
  • Figure 4: Selected results for two-step procedure with the logistic surrogate in case study 2. Left: For $N_T = \{5, 7, 10\}$ the training data set $\mathcal{D}_T$ (black dots) and the mean of the T-posterior predictive distribution (red lines) is shown. Right: For each underlying true input $\omega_I^* \in \{-0.05, 0.1, 0.3\}$ (black vertical lines), we depict the I-posterior distributions for each Point, E-Lik, E-Post, and E-Log-Lik (colored lines).
  • Figure 5: Calibration and sharpness of the I-posteriors using logistic surrogate in case study 2. Top: ECDF difference plots for the I-posterior distributions of $\omega_I$ resulting from the four different methods. The blue areas in the ECDF difference plots indicate 95%-confidence envelopes and the black lines indicate the empirical cumulative distribution function (ECDF) for two different number of simulation points $N_T = \{5, 10\}$. Center: log-gamma-statistics of SBC with calibration threshold depicted as black horizontal line. Bottom: sharpness (90% CI) of I-posterior for four different I-Steps (colored dots/lines) for $N_T = \{5, 6, 7, 8, 9, 10\}$.
  • ...and 10 more figures