Adaptation using spatially distributed Gaussian Processes

Botond Szabo; Amine Hadji; Aad van der Vaart

Adaptation using spatially distributed Gaussian Processes

Botond Szabo, Amine Hadji, Aad van der Vaart

TL;DR

The paper develops a theoretically grounded framework for spatially distributed Gaussian Process posteriors to enable scalable nonparametric regression while preserving posterior contraction properties. It proves rate-adaptive contraction for aggregated GP posteriors under mild conditions, using local priors based on rescaled integrated Brownian motion or Matérn processes and a fully Bayesian mechanism to learn length scales. A novel aggregation scheme improves continuity across regional boundaries and demonstrates strong empirical performance on synthetic and real data (including superconductivity), with substantial speedups. Overall, the work shows that spatially distributed GP methods can adapt to local regularities and potentially outperform standard GPs in accuracy and uncertainty quantification, while providing theoretical guarantees.

Abstract

We consider the accuracy of an approximate posterior distribution in nonparametric regression problems by combining posterior distributions computed on subsets of the data defined by the locations of the independent variables. We show that this approximate posterior retains the rate of recovery of the full data posterior distribution, where the rate of recovery adapts to the smoothness of the true regression function. As particular examples we consider Gaussian process priors based on integrated Brownian motion and the Matérn kernel augmented with a prior on the length scale. Besides theoretical guarantees we present a numerical study of the methods both on synthetic and real world data. We also propose a new aggregation technique, which numerically outperforms previous approaches. Finally, we demonstrate empirically that spatially distributed methods can adapt to local regularities, potentially outperforming the original Gaussian process.

Adaptation using spatially distributed Gaussian Processes

TL;DR

Abstract

Paper Structure (42 sections, 16 theorems, 93 equations, 6 figures, 34 tables)

This paper contains 42 sections, 16 theorems, 93 equations, 6 figures, 34 tables.

Introduction
Spatially distributed Bayesian inference with GP priors
Posterior contraction for distributed GP for independent observations
Adaptation
Examples
Rescaled Integrated Brownian Motion
Matérn process
Numerical analysis
Aggregation techniques
Synthetic datasets
Matérn kernel
Real world dataset: Superconductivity
Discussion
Comparison of Scalable Gaussian Process approximations
Proofs of the general results
...and 27 more sections

Key Result

Theorem 2

Let $f_0$ be a bounded function and assume that there exists a sequence $\varepsilon_n\rightarrow 0$ with $(n/m^2)\varepsilon_n^2\rightarrow\infty$ such that $\phi_{ f_0}^{(k)}(\varepsilon_n)\leq (n/m)\varepsilon_n^2$, for $k=1,\ldots,m$. Then under Assumption ass:metric, the aggregated posterior gi for arbitrary $M_n\rightarrow\infty$. In the distributed nonparametric regression model def:regress

Figures (6)

Figure 1: Deterministic (oracle) rescaling of the Matérn process prior ($\alpha=3$). Benchmark and distributed GP posteriors. True function $f_0(x)=\sum_{j=4}^{\infty}1.5j^{-3/2}\sin(j)\psi_j(x)$ drawn in black. Posterior means drawn by solid lines, surrounded by $95\%$ point-wise credible sets shaded between two dotted lines. The five columns correspond (left to right) to the non-distributed method, the distributed method with random partitioning, and the distributed methods with spatial partitioning without smoothing, with inverse variance weights and with exponential weights. From top to bottom the sample sizes are $n=2000,5000,10000$ and the number of experts $m=10,20,50$.
Figure 2: Empirical Bayes (MMLE) approach for the rescaled Matérn process prior ($\alpha=3$). Benchmark and distributed GP posteriors. True function $f_0(x)=\sum_{j=4}^{\infty}1.5j^{-3/2}\sin(j)\psi_j(x)$ drawn in black. Posterior means drawn by solid lines, surrounded by $95\%$ point-wise credible sets, shaded between two dotted lines. The five columns correspond (left to right) to the non-distributed method, the distributed method with random partitioning, and the distributed methods with spatial partitioning without smoothing, with inverse variance weights and with exponential weights. From top to bottom the sample sizes are $n=2000,5000,10000$ and the number of experts is $m=10,20,50$.
Figure 3: Hierarchical Bayes methods for the rescales Matérn process prior ($\alpha=3$). Benchmark and distributed GP posteriors. True function $f_0(x)=\sum_{j=4}^{\infty}1.5j^{-3/2}\sin(j)\psi_j(x)$ drawn in black. Posterior means drawn by solid lines, surrounded by $95\%$ point-wise credible sets, shaded between two dotted lines. The five columns correspond (left to right) to the non-distributed method, the distributed method with random partitioning, and the distributed methods with spatial partitioning without smoothing, with inverse variance weights and with exponential weights. From top to bottom the sample sizes are $n=2000,5000,10000$ and the number of experts is $m=10,20,50$.
Figure 4: Data-based rescaled (MMLE) squared exponential Gaussian process prior. Benchmark and distributed GP posteriors. True function $f_0$ given in \ref{['f0:spatial']} is drawn in black. Posterior means drawn by solid lines, surrounded by $95\%$ pointwise credible sets, shaded between two dotted lines. The five columns correspond (left to right) to the non-distributed method, the distributed method with random partitioning, and the distributed methods with spatial partitioning without smoothing, with inverse variance weights and with exponential weights. From top to bottom the sample sizes are $n=1000,2000, 5000$ and the numbers of experts are $m=2,4,8$.
Figure 5: Data-based rescaled (MMLE) squared exponential Gaussian process prior. Benchmark and distributed GP posteriors. True function $f_0(x)=\sum_{j=3}^{\infty}2.5 j^{-2}\sin(2j)\psi_j(x)$ drawn in black. Posterior means drawn by solid lines, surrounded by $95\%$ pointwise credible sets, shaded between two dotted lines. The five columns correspond (left to right) to the non-distributed method, the distributed method with random partitioning, and the distributed methods with spatial partitioning without smoothing, with inverse variance weights and with exponential weights. From top to bottom the sample sizes are $n=1000,5000,10000$ and the numbers of experts are $m=5,10,100$.
...and 1 more figures

Theorems & Definitions (26)

Theorem 2
Remark 3
Theorem 4
Remark 5
Corollary 6
Corollary 7
Corollary 8
Corollary 9
Lemma 10
proof
...and 16 more

Adaptation using spatially distributed Gaussian Processes

TL;DR

Abstract

Adaptation using spatially distributed Gaussian Processes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (26)