Table of Contents
Fetching ...

Adaptive sparse variational approximations for Gaussian process regression

Dennis Nieman, Botond Szabó

TL;DR

Theoretical guarantees for hyperparameter selection using variational Bayes in the nonparametric regression model are presented, a variational approximation to a hierarchical Bayes procedure is constructed, and upper bounds for the contraction rate of the variational posterior are derived.

Abstract

Accurate tuning of hyperparameters is crucial to ensure that models can generalise effectively across different settings. In this paper, we present theoretical guarantees for hyperparameter selection using variational Bayes in the nonparametric regression model. We construct a variational approximation to a hierarchical Bayes procedure, and derive upper bounds for the contraction rate of the variational posterior in an abstract setting. The theory is applied to various Gaussian process priors and variational classes, resulting in minimax optimal rates. Our theoretical results are accompanied with numerical analysis both on synthetic and real world data sets.

Adaptive sparse variational approximations for Gaussian process regression

TL;DR

Theoretical guarantees for hyperparameter selection using variational Bayes in the nonparametric regression model are presented, a variational approximation to a hierarchical Bayes procedure is constructed, and upper bounds for the contraction rate of the variational posterior are derived.

Abstract

Accurate tuning of hyperparameters is crucial to ensure that models can generalise effectively across different settings. In this paper, we present theoretical guarantees for hyperparameter selection using variational Bayes in the nonparametric regression model. We construct a variational approximation to a hierarchical Bayes procedure, and derive upper bounds for the contraction rate of the variational posterior in an abstract setting. The theory is applied to various Gaussian process priors and variational classes, resulting in minimax optimal rates. Our theoretical results are accompanied with numerical analysis both on synthetic and real world data sets.

Paper Structure

This paper contains 25 sections, 12 theorems, 187 equations, 5 figures.

Key Result

Theorem 1

Suppose that $\min_{\lambda\in\Lambda_n} n\epsilon_n^2(\lambda) \to \infty$ and $\log |\Lambda_n| = o(n\delta_n^2)$. Then under conditions e:eigBound, e:f0.tail, e:priortail, e:prior.reg.sc and e:hyper.mass, there exist a constant $M>0$ and an event $\mathcal{A}_n$ with $\mathbb P_{f_0}^{(n)}(\mathc

Figures (5)

  • Figure 1: Variational posterior with $m=21$ population features based on polynomially decaying series prior \ref{['def:prior:poly']}. Data simulated as in \ref{['e:sim.poly.data']} with $f_0$ given in \ref{['e:sim.poly.f0']}. The shaded region indicates $95\%$ pointwise credible regions of the variational posterior.
  • Figure 2: Setting exactly as in Figure \ref{['f:m21']}, but with $m=42$ features.
  • Figure 3: Mean and 95% credible region of the variational posterior for runner's speed based on the squared exponential prior \ref{['e:kernel.sq']}. A variational approximation was used with $m=150$ sample spectral features, where hyperparameters and model variance were tuned using the evidence lower bound. Optimisation time: 286 sec.
  • Figure 4: Zoom of variational procedure in Figure \ref{['f:running']} with plot of data points.
  • Figure 5: Empirical Bayes posterior mean and point-wise $95\%$ credible region for the running data with squared exponential prior \ref{['e:kernel.sq']}. Optimisation time: 817 sec.

Theorems & Definitions (12)

  • Theorem 1
  • Theorem 2
  • Corollary 3
  • Corollary 4
  • Corollary 5
  • Lemma 6
  • Theorem 7
  • Lemma 8
  • Lemma 9
  • Lemma 10
  • ...and 2 more