Table of Contents
Fetching ...

Gaussian Process Boosting

Fabio Sigrist

TL;DR

An extension that scales to large data using a Vecchia approximation for the Gaussian process model relying on novel results for covariance parameter inference and obtaining increased predictive performance compared to existing approaches.

Abstract

We introduce a novel way to combine boosting with Gaussian process and mixed effects models. This allows for relaxing, first, the zero or linearity assumption for the prior mean function in Gaussian process and grouped random effects models in a flexible non-parametric way and, second, the independence assumption made in most boosting algorithms. The former is advantageous for prediction accuracy and for avoiding model misspecifications. The latter is important for efficient learning of the fixed effects predictor function and for obtaining probabilistic predictions. Our proposed algorithm is also a novel solution for handling high-cardinality categorical variables in tree-boosting. In addition, we present an extension that scales to large data using a Vecchia approximation for the Gaussian process model relying on novel results for covariance parameter inference. We obtain increased prediction accuracy compared to existing approaches on multiple simulated and real-world data sets.

Gaussian Process Boosting

TL;DR

An extension that scales to large data using a Vecchia approximation for the Gaussian process model relying on novel results for covariance parameter inference and obtaining increased predictive performance compared to existing approaches.

Abstract

We introduce a novel way to combine boosting with Gaussian process and mixed effects models. This allows for relaxing, first, the zero or linearity assumption for the prior mean function in Gaussian process and grouped random effects models in a flexible non-parametric way and, second, the independence assumption made in most boosting algorithms. The former is advantageous for prediction accuracy and for avoiding model misspecifications. The latter is important for efficient learning of the fixed effects predictor function and for obtaining probabilistic predictions. Our proposed algorithm is also a novel solution for handling high-cardinality categorical variables in tree-boosting. In addition, we present an extension that scales to large data using a Vecchia approximation for the Gaussian process model relying on novel results for covariance parameter inference. We obtain increased prediction accuracy compared to existing approaches on multiple simulated and real-world data sets.

Paper Structure

This paper contains 32 sections, 4 theorems, 75 equations, 6 figures, 11 tables, 2 algorithms.

Key Result

Proposition 3.1

The gradient of the negative log-likelihood $\tilde{L}(y,F,\theta)$ for the Vecchia approximation given in vecchia_approx can be calculated as where and $\frac{\partial B}{\partial \theta_k}$ are lower triangular and $\frac{\partial D}{\partial \theta_k}$ diagonal matrices with non-zero entries given by for $1< k\leq q$, and for $k=1$, the non-zero entries of $\frac{\partial B}{\partial \theta_

Figures (6)

  • Figure 1: Example of locations for training and test data for the spatial data. "Test" and "Test_ext" refers to locations of the "interpolation" and "extrapolation" test data sets, respectively. The black crosses show examples of locations for which predictions of sums are made.
  • Figure 2: Test RMSE for the wages data for every fold separately.
  • Figure 3: Illustration of house price data: map with observation locations and log-prices (left plot) and smoothed differences of log-prices from the global mean (right plot).
  • Figure 4: Prediction accuracy for the housing data for every fold separately. Results for the predictions of sums are denoted by '_sum'. 'QL' denotes the quantile loss.
  • Figure 5: Violin plots illustrating the prediction accuracy for the small subsets of the housing data. The red rhombi represent means over the sample splits.
  • ...and 1 more figures

Theorems & Definitions (8)

  • Proposition 3.1
  • Proposition 3.2
  • Proposition 3.3
  • Proposition 3.4
  • proof : Proof of Proposition \ref{['GradVecchia']}
  • proof : Proof of Proposition \ref{['FIVecchia']}
  • proof : Proof of Proposition \ref{['PredVEcchiaOF']}
  • proof : Proof of Proposition \ref{['PredVEcchiaPF']}