Surrogate-based Bayesian calibration methods for chaotic systems: a comparison of traditional and non-traditional approaches
Maike F. Holthuijzen, Atlanta Chakraborty, Elizabeth Krath, Tommie Catanach
TL;DR
This work tackles the challenge of calibrating parameters in chaotic, computationally expensive dynamical systems by comparing four emulator-based Bayesian strategies: Calibrate–Emulate–Sample (CES), History Matching (HM), Bayesian Optimal Experimental Design (BOED), and its goal-oriented variant (GBOED). Using Gaussian process surrogates to emulate forward maps, the study evaluates performance on the Lorenz '96 multiscale system and a two-layer quasi-geostrophic model, highlighting how design choices and computational budgets impact posterior accuracy and uncertainty. A key contribution is the introduction of GBOED, which targets information gain about the calibration posterior rather than emulator fidelity alone, and the systematic comparison reveals that CES, HM, and GBOED can match or exceed BOED in calibration tasks, with standard BOED underperforming in chaotic settings. The findings provide practical guidance for selecting emulator-based calibration strategies under budget constraints and motivate hybrid designs that combine global efficiency with calibration-focused refinement for complex, high-cost systems.
Abstract
Parameter calibration is essential for reducing uncertainty and improving predictive fidelity in physics-based models, yet it is often limited by the high computational cost of model evaluations. Bayesian calibration methods provide a principled framework for combining prior information with data while rigorously quantifying uncertainty. In this work, we compare four emulator-based Bayesian calibration strategies, Calibrate-Emulate-Sample (CES), History Matching (HM), Bayesian Optimal Experimental Design (BOED), and a goal-oriented extension of BOED (GBOED). The proposed GBOED formulation explicitly targets information gain with respect to the calibration posterior, aligning design decisions with downstream inference. We assess methods using accuracy and uncertainty quantification metrics, convergence behavior under increasing computational budgets, and practical considerations such as implementation complexity and robustness. For the Lorenz '96 system, CES, HM, and GBOED all yield strong calibration performance, even with limited numbers of model evaluations, while standard BOED generally underperforms in this setting. Differences among the strongest methods are modest, particularly as computational budgets increase. For the two-layer quasi-geostrophic system, all methods produce reasonable posterior estimates, and convergence behavior is more consistent. Overall, our results indicate that multiple emulator-based calibration strategies can perform comparably well when applied appropriately, with method selection often guided more by computational and practical considerations than by accuracy alone. These findings highlight both the limitations of standard BOED for calibration and the promise of goal-oriented and iterative approaches for efficient Bayesian inference in complex dynamical systems.
