Deep Horseshoe Gaussian Processes

Ismaël Castillo; Thibault Randrianarisoa

Deep Horseshoe Gaussian Processes

Ismaël Castillo, Thibault Randrianarisoa

TL;DR

This paper introduces the Deep Horseshoe Gaussian Process (Deep-HGP) prior, a simple Bayesian nonparametric construction that stacks Gaussian-process layers with lengthscales drawn from half-Horseshoe priors to enable both adaptation to the smoothness of the regression function and soft, data-driven variable selection across high-dimensional inputs. A key novelty is the freezing-of-paths mechanism, where shrinking irrelevant coordinates' lengthscales drives near-constant behavior along those directions, effectively reducing active dimensionality without hard model selection. The authors establish near-minimax posterior contraction rates that adapt to both smoothness and compositional structure, including dimension-dependent bounds that allow ambient dimension to grow with sample size, and they extend results to both shallow and multilayer Deep-HGP priors as well as to standard posteriors via augmented priors. This yields theoretically grounded, scalable Bayesian priors for complex, high-dimensional regression with compositional structure, with practical implications for uncertainty quantification and model interpretation in deep Bayesian nonparametrics.

Abstract

Deep Gaussian processes have recently been proposed as natural objects to fit, similarly to deep neural networks, possibly complex features present in modern data samples, such as compositional structures. Adopting a Bayesian nonparametric approach, it is natural to use deep Gaussian processes as prior distributions, and use the corresponding posterior distributions for statistical inference. We introduce the deep Horseshoe Gaussian process Deep-HGP, a new simple prior based on deep Gaussian processes with a squared-exponential kernel, that in particular enables data-driven choices of the key lengthscale parameters. For nonparametric regression with random design, we show that the associated posterior distribution recovers the unknown true regression curve optimally in terms of quadratic loss, up to a logarithmic factor, in an adaptive way. The convergence rates are simultaneously adaptive to both the smoothness of the regression function and to its structure in terms of compositions. The dependence of the rates in terms of dimension are explicit, allowing in particular for input spaces of dimension increasing with the number of observations.

Deep Horseshoe Gaussian Processes

TL;DR

Abstract

Paper Structure (30 sections, 34 theorems, 271 equations, 1 figure)

This paper contains 30 sections, 34 theorems, 271 equations, 1 figure.

Introduction
The deep horseshoe GP prior
Structural assumptions for multivariate regression
Key ingredients
Deep Horseshoe Gaussian Process prior
Main results I: shallow case and "freezing of paths"
"Freezing of paths" for effective variable selection: a new property of scalings of Gaussian processes
Single layer setting: horseshoe GP
Main results II: deep simultaneous adaptation to structure and smoothness
Multilayer setting: Deep Horseshoe GP
Results for standard posteriors
Discussion and open questions
Proof of the main results
Proof of Theorem \ref{['thmtoy']}
Proof of Theorem \ref{['thmvs']}
...and 15 more sections

Key Result

Theorem 1

Let $d\ge 2$ be a fixed integer and for $K\ge 1, \beta>0$, set $\mathcal{F}(K):=\mathcal{F}_{VS}(K,\beta, d, d^*)$. Fix $\rho\in(0,1)$, let $f_0\in \mathcal{F}(K)$ and suppose with $S_0:=\{i_1,\ldots,i_{d^*}\}\subset\{1,\ldots,n\}$ and some $1\le d^* \le d$. Let $\Pi$ be a multibandwidth prior with $W$ a $d$--dimensional SqExp Gaussian process with deterministic scaling parameters Then, there ex

Figures (1)

Figure 1: Composition of two Gaussian processes with SqExp covariance kernel $K(s,t)=e^{-(s-t)^2}$.

Theorems & Definitions (63)

Theorem 1: freezing of paths
Theorem 2: Single layer, generic result
Corollary 1: Optimal $a^*$ and posterior rate
Example 1: Exponential prior with fixed scaling $\lambda$
Example 2: Horseshoe prior with fixed parameter $\tau$
Corollary 2: Fixed dimensions
Example 3: Horseshoe prior with vanishing parameter $\tau$
Corollary 3: High-dimensional horseshoe GP
Theorem 3
Remark 1: Benign overfitting
...and 53 more

Deep Horseshoe Gaussian Processes

TL;DR

Abstract

Deep Horseshoe Gaussian Processes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (63)