Table of Contents
Fetching ...

Surrogate-based Bayesian calibration methods for chaotic systems: a comparison of traditional and non-traditional approaches

Maike F. Holthuijzen, Atlanta Chakraborty, Elizabeth Krath, Tommie Catanach

TL;DR

This work tackles the challenge of calibrating parameters in chaotic, computationally expensive dynamical systems by comparing four emulator-based Bayesian strategies: Calibrate–Emulate–Sample (CES), History Matching (HM), Bayesian Optimal Experimental Design (BOED), and its goal-oriented variant (GBOED). Using Gaussian process surrogates to emulate forward maps, the study evaluates performance on the Lorenz '96 multiscale system and a two-layer quasi-geostrophic model, highlighting how design choices and computational budgets impact posterior accuracy and uncertainty. A key contribution is the introduction of GBOED, which targets information gain about the calibration posterior rather than emulator fidelity alone, and the systematic comparison reveals that CES, HM, and GBOED can match or exceed BOED in calibration tasks, with standard BOED underperforming in chaotic settings. The findings provide practical guidance for selecting emulator-based calibration strategies under budget constraints and motivate hybrid designs that combine global efficiency with calibration-focused refinement for complex, high-cost systems.

Abstract

Parameter calibration is essential for reducing uncertainty and improving predictive fidelity in physics-based models, yet it is often limited by the high computational cost of model evaluations. Bayesian calibration methods provide a principled framework for combining prior information with data while rigorously quantifying uncertainty. In this work, we compare four emulator-based Bayesian calibration strategies, Calibrate-Emulate-Sample (CES), History Matching (HM), Bayesian Optimal Experimental Design (BOED), and a goal-oriented extension of BOED (GBOED). The proposed GBOED formulation explicitly targets information gain with respect to the calibration posterior, aligning design decisions with downstream inference. We assess methods using accuracy and uncertainty quantification metrics, convergence behavior under increasing computational budgets, and practical considerations such as implementation complexity and robustness. For the Lorenz '96 system, CES, HM, and GBOED all yield strong calibration performance, even with limited numbers of model evaluations, while standard BOED generally underperforms in this setting. Differences among the strongest methods are modest, particularly as computational budgets increase. For the two-layer quasi-geostrophic system, all methods produce reasonable posterior estimates, and convergence behavior is more consistent. Overall, our results indicate that multiple emulator-based calibration strategies can perform comparably well when applied appropriately, with method selection often guided more by computational and practical considerations than by accuracy alone. These findings highlight both the limitations of standard BOED for calibration and the promise of goal-oriented and iterative approaches for efficient Bayesian inference in complex dynamical systems.

Surrogate-based Bayesian calibration methods for chaotic systems: a comparison of traditional and non-traditional approaches

TL;DR

This work tackles the challenge of calibrating parameters in chaotic, computationally expensive dynamical systems by comparing four emulator-based Bayesian strategies: Calibrate–Emulate–Sample (CES), History Matching (HM), Bayesian Optimal Experimental Design (BOED), and its goal-oriented variant (GBOED). Using Gaussian process surrogates to emulate forward maps, the study evaluates performance on the Lorenz '96 multiscale system and a two-layer quasi-geostrophic model, highlighting how design choices and computational budgets impact posterior accuracy and uncertainty. A key contribution is the introduction of GBOED, which targets information gain about the calibration posterior rather than emulator fidelity alone, and the systematic comparison reveals that CES, HM, and GBOED can match or exceed BOED in calibration tasks, with standard BOED underperforming in chaotic settings. The findings provide practical guidance for selecting emulator-based calibration strategies under budget constraints and motivate hybrid designs that combine global efficiency with calibration-focused refinement for complex, high-cost systems.

Abstract

Parameter calibration is essential for reducing uncertainty and improving predictive fidelity in physics-based models, yet it is often limited by the high computational cost of model evaluations. Bayesian calibration methods provide a principled framework for combining prior information with data while rigorously quantifying uncertainty. In this work, we compare four emulator-based Bayesian calibration strategies, Calibrate-Emulate-Sample (CES), History Matching (HM), Bayesian Optimal Experimental Design (BOED), and a goal-oriented extension of BOED (GBOED). The proposed GBOED formulation explicitly targets information gain with respect to the calibration posterior, aligning design decisions with downstream inference. We assess methods using accuracy and uncertainty quantification metrics, convergence behavior under increasing computational budgets, and practical considerations such as implementation complexity and robustness. For the Lorenz '96 system, CES, HM, and GBOED all yield strong calibration performance, even with limited numbers of model evaluations, while standard BOED generally underperforms in this setting. Differences among the strongest methods are modest, particularly as computational budgets increase. For the two-layer quasi-geostrophic system, all methods produce reasonable posterior estimates, and convergence behavior is more consistent. Overall, our results indicate that multiple emulator-based calibration strategies can perform comparably well when applied appropriately, with method selection often guided more by computational and practical considerations than by accuracy alone. These findings highlight both the limitations of standard BOED for calibration and the promise of goal-oriented and iterative approaches for efficient Bayesian inference in complex dynamical systems.

Paper Structure

This paper contains 24 sections, 19 equations, 6 figures, 1 table, 3 algorithms.

Figures (6)

  • Figure 1: Frequency histograms of posterior distributions of parameters $h, F, \log c$, and $b$ resulting from CES200, HM200, BOED200, and GBOED200 (left) and CES5400, HM1000, BOE1000, and GBOED1000 (right). Histograms represent 8092 values. True parameter values are denoted by vertical lines.
  • Figure 2: AEs (left) and scores (right) for CES, EKS, HM, BOED, and GBOED over increasing model evaluations for calibrating the Lorenz '96 system using base parameters $\theta = (1, 10, 10, 10)$ (results for parameter $c$ are calculated in log space).
  • Figure 3: Boxplots of AEs (A) and log score (B) for calibrated parameters $h, F, \log c$ and $b$ of the Lorenz '96 model for all methods. Each boxplot for AE represents 60 values; values less than -50 for score were omitted. AE values were calculated using the mean of the posterior distribution. Horizontal lines within boxplots represent the median, while stars denote means.
  • Figure 4: Estimated marginal means of absolute errors (AEs) (top) and score (bottom) averaged over the four parameters for the AE and score linear models described in Section 3.2. Error bars represent 95% confidence intervals (note that the confidence intervals reflect uncertainty around the estimated marginal means, not the raw data).
  • Figure 5: Frequency histograms of posterior distributions of $\textup{log}\mu, \textup{log}\nu, H_1$ and $H_2$ for 200 model evaluations (left) and 500 model evaluations (right). Histograms represent 8092 values. True parameter values ($\theta = (\log 0.032, \log 9.5 \times 10^{-6}, 0.25, 0.85)$ for $\log \mu, \log \nu, H_1$ and $H_2$) are denoted by vertical black lines.
  • ...and 1 more figures