Table of Contents
Fetching ...

Combining Climate Models using Bayesian Regression Trees and Random Paths

John C. Yannotty, Thomas J. Santner, Bo Li, Matthew T. Pratola

TL;DR

This paper tackles the challenge of integrating multiple climate models with input-dependent, smooth weighting by introducing Random Path BART (RPBART), a matrix-free, continuous extension of Bayesian Additive Regression Trees. RPBART uses latent random-path indicators to smooth the mean function across tree partitions, and a semivariogram-informed prior calibration guides hyperparameters without cross-validation. The framework extends to Bayesian Model Mixing (RPBART-BMM) for combining multiple simulators, with posterior weight projections enabling simplex-constrained interpretations of model contributions. Demonstrations on eight GCMs for global surface temperature show improved predictive accuracy and richer regional diagnostics, illustrating the method’s practical value for climate-model ensembles and beyond.

Abstract

General circulation models (GCMs) are essential tools for climate studies. Such climate models may have varying accuracy across the input domain, but no model is uniformly best. One can improve climate model prediction performance by integrating multiple models using input-dependent weights. Weight functions modeled using Bayesian Additive Regression Trees (BART) were recently shown to be useful in nuclear physics applications. However, a restriction of that approach was the piecewise constant weight functions. To smoothly integrate multiple climate models, we propose a new tree-based model, Random Path BART (RPBART), that incorporates random path assignments in BART to produce smooth weight functions and smooth predictions, all in a matrix-free formulation. RPBART requires a more complex prior specification, for which we introduce a semivariogram to guide hyperparameter selection. This approach is easy to interpret, computationally cheap, and avoids expensive cross-validation. Finally, we propose a posterior projection technique to enable detailed analysis of the fitted weight functions. This allows us to identify a sparse set of climate models that recovers the underlying system within a given spatial region as well as quantifying model discrepancy given the available model set. Our method is demonstrated on an ensemble of 8 GCMs modeling average monthly surface temperature.

Combining Climate Models using Bayesian Regression Trees and Random Paths

TL;DR

This paper tackles the challenge of integrating multiple climate models with input-dependent, smooth weighting by introducing Random Path BART (RPBART), a matrix-free, continuous extension of Bayesian Additive Regression Trees. RPBART uses latent random-path indicators to smooth the mean function across tree partitions, and a semivariogram-informed prior calibration guides hyperparameters without cross-validation. The framework extends to Bayesian Model Mixing (RPBART-BMM) for combining multiple simulators, with posterior weight projections enabling simplex-constrained interpretations of model contributions. Demonstrations on eight GCMs for global surface temperature show improved predictive accuracy and richer regional diagnostics, illustrating the method’s practical value for climate-model ensembles and beyond.

Abstract

General circulation models (GCMs) are essential tools for climate studies. Such climate models may have varying accuracy across the input domain, but no model is uniformly best. One can improve climate model prediction performance by integrating multiple models using input-dependent weights. Weight functions modeled using Bayesian Additive Regression Trees (BART) were recently shown to be useful in nuclear physics applications. However, a restriction of that approach was the piecewise constant weight functions. To smoothly integrate multiple climate models, we propose a new tree-based model, Random Path BART (RPBART), that incorporates random path assignments in BART to produce smooth weight functions and smooth predictions, all in a matrix-free formulation. RPBART requires a more complex prior specification, for which we introduce a semivariogram to guide hyperparameter selection. This approach is easy to interpret, computationally cheap, and avoids expensive cross-validation. Finally, we propose a posterior projection technique to enable detailed analysis of the fitted weight functions. This allows us to identify a sparse set of climate models that recovers the underlying system within a given spatial region as well as quantifying model discrepancy given the available model set. Our method is demonstrated on an ensemble of 8 GCMs modeling average monthly surface temperature.
Paper Structure (32 sections, 4 theorems, 54 equations, 17 figures, 2 tables, 1 algorithm)

This paper contains 32 sections, 4 theorems, 54 equations, 17 figures, 2 tables, 1 algorithm.

Key Result

Theorem 3.1

Assume the random quantities $\lbrace T_j, M_j, Z_j, \gamma_j \rbrace_{j = 1}^m$ are distributed as specified in Section subsect:rpath_prior. Conditional on $\sigma^2$, the function $\nu(\boldsymbol x, \boldsymbol h)$ for the RPBART model is where $m\tau^2 = (\frac{y_\text{max}-y_\text{min}}{2k})^2$ defines the variance of the sum-of-trees, $k$ is a tuning parameter, $y_\text{max} - y_\text{min}$

Figures (17)

  • Figure 1: (Left) An example of a tree structure $T$ applied to a $2$-dimensional input space. The internal and terminal nodes of the tree are denoted with superscripts $(i)$ and $(t)$, respectively. Each internal node facilitates a binary split of the form $x_v < c_v$. (Right) The partitions of the rectangular input space $[L_1,U_1]\times[L_2,U_2]$, with associated terminal node parameters $\mu_{b}$, $b=1,2,3$. The function $g(\boldsymbol x;T,M)$ maps a given $\boldsymbol x$ to one of these three values.
  • Figure 2: Three examples of $1-\psi(\boldsymbol x;1,0 , \gamma_j)$ (red) and $\psi(\boldsymbol x;1,0 , \gamma_j)$ (blue) with bandwidth parameters of $0.5$, $0.25$, and $0.1$, corresponding to the move that splits the interval $[L^{(1)}_1,U^{(1)}_1] = [-1,1]$ on the cutpoint $c_{1j} = 0$. The interval $\mathcal{I}_{1j}(\gamma_j)$ (orange) defines the interval where the random paths can disagree with the traditional deterministic paths.
  • Figure 3: (Left) An example tree with $B_j = 3$ terminal nodes (red, blue, green). The internal nodes (black) define a set of splitting rules that recursively partition the input space. (Right) Example path probabilities, $\phi_{bj}(\boldsymbol x;T_j,\gamma_j)$, using $\gamma_j = 0.5$. Each curve displays the probability of being mapped to the corresponding terminal node as a function of $x_1$.
  • Figure 4: The semivariogram for different values of $k = 1$ (red), $k = 1.5$ (blue), and $k= 2$ (green) and bandwidth hyperparameters $\alpha_1$ and $\alpha_2$. Each semivariogram is generated using $y_{min} = -1$, $y_{max} = 1$, $\sigma = 0$, and $\alpha = 0.95$, and $\beta = 0.5$.
  • Figure 5: (Left) The underlying system $f_\dagger(\boldsymbol x)$. (Center) The output of the first simulator. (Right) The output of the second simulator.
  • ...and 12 more figures

Theorems & Definitions (5)

  • Theorem 3.1
  • Definition 1
  • Theorem 7.1: Conditional Covariance
  • Theorem 7.2
  • Theorem 7.3