Combining Climate Models using Bayesian Regression Trees and Random Paths
John C. Yannotty, Thomas J. Santner, Bo Li, Matthew T. Pratola
TL;DR
This paper tackles the challenge of integrating multiple climate models with input-dependent, smooth weighting by introducing Random Path BART (RPBART), a matrix-free, continuous extension of Bayesian Additive Regression Trees. RPBART uses latent random-path indicators to smooth the mean function across tree partitions, and a semivariogram-informed prior calibration guides hyperparameters without cross-validation. The framework extends to Bayesian Model Mixing (RPBART-BMM) for combining multiple simulators, with posterior weight projections enabling simplex-constrained interpretations of model contributions. Demonstrations on eight GCMs for global surface temperature show improved predictive accuracy and richer regional diagnostics, illustrating the method’s practical value for climate-model ensembles and beyond.
Abstract
General circulation models (GCMs) are essential tools for climate studies. Such climate models may have varying accuracy across the input domain, but no model is uniformly best. One can improve climate model prediction performance by integrating multiple models using input-dependent weights. Weight functions modeled using Bayesian Additive Regression Trees (BART) were recently shown to be useful in nuclear physics applications. However, a restriction of that approach was the piecewise constant weight functions. To smoothly integrate multiple climate models, we propose a new tree-based model, Random Path BART (RPBART), that incorporates random path assignments in BART to produce smooth weight functions and smooth predictions, all in a matrix-free formulation. RPBART requires a more complex prior specification, for which we introduce a semivariogram to guide hyperparameter selection. This approach is easy to interpret, computationally cheap, and avoids expensive cross-validation. Finally, we propose a posterior projection technique to enable detailed analysis of the fitted weight functions. This allows us to identify a sparse set of climate models that recovers the underlying system within a given spatial region as well as quantifying model discrepancy given the available model set. Our method is demonstrated on an ensemble of 8 GCMs modeling average monthly surface temperature.
