Table of Contents
Fetching ...

Deep Horseshoe Gaussian Processes

Ismaël Castillo, Thibault Randrianarisoa

TL;DR

This paper introduces the Deep Horseshoe Gaussian Process (Deep-HGP) prior, a simple Bayesian nonparametric construction that stacks Gaussian-process layers with lengthscales drawn from half-Horseshoe priors to enable both adaptation to the smoothness of the regression function and soft, data-driven variable selection across high-dimensional inputs. A key novelty is the freezing-of-paths mechanism, where shrinking irrelevant coordinates' lengthscales drives near-constant behavior along those directions, effectively reducing active dimensionality without hard model selection. The authors establish near-minimax posterior contraction rates that adapt to both smoothness and compositional structure, including dimension-dependent bounds that allow ambient dimension to grow with sample size, and they extend results to both shallow and multilayer Deep-HGP priors as well as to standard posteriors via augmented priors. This yields theoretically grounded, scalable Bayesian priors for complex, high-dimensional regression with compositional structure, with practical implications for uncertainty quantification and model interpretation in deep Bayesian nonparametrics.

Abstract

Deep Gaussian processes have recently been proposed as natural objects to fit, similarly to deep neural networks, possibly complex features present in modern data samples, such as compositional structures. Adopting a Bayesian nonparametric approach, it is natural to use deep Gaussian processes as prior distributions, and use the corresponding posterior distributions for statistical inference. We introduce the deep Horseshoe Gaussian process Deep-HGP, a new simple prior based on deep Gaussian processes with a squared-exponential kernel, that in particular enables data-driven choices of the key lengthscale parameters. For nonparametric regression with random design, we show that the associated posterior distribution recovers the unknown true regression curve optimally in terms of quadratic loss, up to a logarithmic factor, in an adaptive way. The convergence rates are simultaneously adaptive to both the smoothness of the regression function and to its structure in terms of compositions. The dependence of the rates in terms of dimension are explicit, allowing in particular for input spaces of dimension increasing with the number of observations.

Deep Horseshoe Gaussian Processes

TL;DR

This paper introduces the Deep Horseshoe Gaussian Process (Deep-HGP) prior, a simple Bayesian nonparametric construction that stacks Gaussian-process layers with lengthscales drawn from half-Horseshoe priors to enable both adaptation to the smoothness of the regression function and soft, data-driven variable selection across high-dimensional inputs. A key novelty is the freezing-of-paths mechanism, where shrinking irrelevant coordinates' lengthscales drives near-constant behavior along those directions, effectively reducing active dimensionality without hard model selection. The authors establish near-minimax posterior contraction rates that adapt to both smoothness and compositional structure, including dimension-dependent bounds that allow ambient dimension to grow with sample size, and they extend results to both shallow and multilayer Deep-HGP priors as well as to standard posteriors via augmented priors. This yields theoretically grounded, scalable Bayesian priors for complex, high-dimensional regression with compositional structure, with practical implications for uncertainty quantification and model interpretation in deep Bayesian nonparametrics.

Abstract

Deep Gaussian processes have recently been proposed as natural objects to fit, similarly to deep neural networks, possibly complex features present in modern data samples, such as compositional structures. Adopting a Bayesian nonparametric approach, it is natural to use deep Gaussian processes as prior distributions, and use the corresponding posterior distributions for statistical inference. We introduce the deep Horseshoe Gaussian process Deep-HGP, a new simple prior based on deep Gaussian processes with a squared-exponential kernel, that in particular enables data-driven choices of the key lengthscale parameters. For nonparametric regression with random design, we show that the associated posterior distribution recovers the unknown true regression curve optimally in terms of quadratic loss, up to a logarithmic factor, in an adaptive way. The convergence rates are simultaneously adaptive to both the smoothness of the regression function and to its structure in terms of compositions. The dependence of the rates in terms of dimension are explicit, allowing in particular for input spaces of dimension increasing with the number of observations.
Paper Structure (30 sections, 34 theorems, 271 equations, 1 figure)

This paper contains 30 sections, 34 theorems, 271 equations, 1 figure.

Key Result

Theorem 1

Let $d\ge 2$ be a fixed integer and for $K\ge 1, \beta>0$, set $\mathcal{F}(K):=\mathcal{F}_{VS}(K,\beta, d, d^*)$. Fix $\rho\in(0,1)$, let $f_0\in \mathcal{F}(K)$ and suppose with $S_0:=\{i_1,\ldots,i_{d^*}\}\subset\{1,\ldots,n\}$ and some $1\le d^* \le d$. Let $\Pi$ be a multibandwidth prior with $W$ a $d$--dimensional SqExp Gaussian process with deterministic scaling parameters Then, there ex

Figures (1)

  • Figure 1: Composition of two Gaussian processes with SqExp covariance kernel $K(s,t)=e^{-(s-t)^2}$.

Theorems & Definitions (63)

  • Theorem 1: freezing of paths
  • Theorem 2: Single layer, generic result
  • Corollary 1: Optimal $a^*$ and posterior rate
  • Example 1: Exponential prior with fixed scaling $\lambda$
  • Example 2: Horseshoe prior with fixed parameter $\tau$
  • Corollary 2: Fixed dimensions
  • Example 3: Horseshoe prior with vanishing parameter $\tau$
  • Corollary 3: High-dimensional horseshoe GP
  • Theorem 3
  • Remark 1: Benign overfitting
  • ...and 53 more