Table of Contents
Fetching ...

Vecchia Gaussian Processes: on probabilistic and statistical properties

Botond Szabo, Yichen Zhu

TL;DR

This paper systematically study the Vecchia approximation of the popular, isotropic Mat\'{e}rn GP as standalone stochastic process and uncover key probabilistic and statistical properties, and proposes selecting parent sets as norming sets with fixed cardinality in the Vecchia approximation.

Abstract

Gaussian Processes (GPs) are widely used to model dependencies in spatial statistics and machine learning. However, exact inference is computationally intractable for GP regression, with a time complexity of $O(n^3)$. The Vecchia approximation scales up computation by introducing sparsity into the spatial dependency structure, represented by a directed acyclic graph (DAG). Despite its practical popularity, this approach lacks rigorous theoretical foundations, and the choice of DAG structure remains an open problem. In this paper, we systematically study the Vecchia approximation of the popular, isotropic Matérn GP as standalone stochastic process and uncover key probabilistic and statistical properties. We propose selecting parent sets as norming sets with fixed cardinality in the Vecchia approximation. On the probabilistic side, we show that the conditional distributions of Matérn GPs, as well as their Vecchia approximations, can be characterized by polynomial interpolations. This enables us to establish several results on small ball probabilities and the Reproducing Kernel Hilbert Spaces (RKHSs) of Vecchia GPs. Building on these probabilistic results, we prove that in the nonparametric regression model, the corresponding posterior contracts around the truth at the optimal minimax rate, both under oracle rescaling and hierarchical tuning of the prior. We illustrate the theoretical findings through numerical experiments on synthetic datasets. Our core algorithms are implemented in C++ with an R interface.

Vecchia Gaussian Processes: on probabilistic and statistical properties

TL;DR

This paper systematically study the Vecchia approximation of the popular, isotropic Mat\'{e}rn GP as standalone stochastic process and uncover key probabilistic and statistical properties, and proposes selecting parent sets as norming sets with fixed cardinality in the Vecchia approximation.

Abstract

Gaussian Processes (GPs) are widely used to model dependencies in spatial statistics and machine learning. However, exact inference is computationally intractable for GP regression, with a time complexity of . The Vecchia approximation scales up computation by introducing sparsity into the spatial dependency structure, represented by a directed acyclic graph (DAG). Despite its practical popularity, this approach lacks rigorous theoretical foundations, and the choice of DAG structure remains an open problem. In this paper, we systematically study the Vecchia approximation of the popular, isotropic Matérn GP as standalone stochastic process and uncover key probabilistic and statistical properties. We propose selecting parent sets as norming sets with fixed cardinality in the Vecchia approximation. On the probabilistic side, we show that the conditional distributions of Matérn GPs, as well as their Vecchia approximations, can be characterized by polynomial interpolations. This enables us to establish several results on small ball probabilities and the Reproducing Kernel Hilbert Spaces (RKHSs) of Vecchia GPs. Building on these probabilistic results, we prove that in the nonparametric regression model, the corresponding posterior contracts around the truth at the optimal minimax rate, both under oracle rescaling and hierarchical tuning of the prior. We illustrate the theoretical findings through numerical experiments on synthetic datasets. Our core algorithms are implemented in C++ with an R interface.

Paper Structure

This paper contains 72 sections, 23 theorems, 329 equations, 12 figures, 1 table, 4 algorithms.

Key Result

Lemma 1

Let $K(\cdot, \cdot)$ denote the Matérn covariance kernel with regularity parameter $\alpha>0$, see eq:MaternCov. Then $K(\cdot,\cdot)$ is $2\underline{\alpha}$ times differentiable on $\mathbb{R}^{2d}$, such that $\forall \;k_1,k _2\in\mathbb{N}^d$, $|k_1|+|k_2|\le 2\underline{\alpha}$ and $\forall Furthermore, for all $k\in\mathbb{N}^d$, $|k|\le\underline{\alpha}$ and $\forall \; x_1,x_2\in\math

Figures (12)

  • Figure 1: Illustration of layers on a $9\times 9$ grid: red dots: current layer; blue dots: all previous layers; black crosses: all latter layers.
  • Figure 2: Continuing the example in Figure \ref{['fig:layers']}, illustration of parent sets for $X_i\in \mathcal{N}_2$, with $\underline{\alpha}=1$. Red dots: current layer $\mathcal{N}_2$; Blue dots: previous layers $\mathcal{N}_0$, $\mathcal{N}_1$; Black crosses: all latter layers. Blue arrows: directed edges from parent sets to children for some $X_i\in \mathcal{N}_2$.
  • Figure A.1: Illustration of nonparametric regression with Vecchia GPs. The black lines and dots represent the true regression function and the observed data, respectively. The colored lines and shaded regions depict the posterior means and the $95\%$ pointwise credible intervals obtained from the two Vecchia GP methods.
  • Figure A.2: Qualitative results of the two Vecchia GP methods (Norming and Maximin) when the true function has Hölder regularity $\beta=1.5$. Left: posterior estimation error, measured by the $L_2$-distance between the true regression function $f_0$ and the posterior mean. Middle: prior approximation error, measured by the squared Wasserstein distance between marginals of the Vecchia GPs and their mother GPs. Right: run time of MCMC inference, measured in seconds.
  • Figure A.3: Qualitative results of three Vecchia GP methods for analytic truth. Left: posterior estimation error, measured by $L_2$-norm between the truth and the posterior mean. Middle: prior approximation error, measured by the squared Wasserstein distance between the marginals of the Vecchia GPs and their mother GPs. Right: run time of MCMC inference, measured in seconds.
  • ...and 7 more figures

Theorems & Definitions (43)

  • Lemma 1
  • Lemma 2
  • Theorem 1
  • Lemma 3
  • Remark 1
  • Lemma 4
  • Theorem 2: Small deviation bound
  • Lemma 5
  • Lemma 6: Theorem 11.4 of wendland2004scattered
  • Lemma 7
  • ...and 33 more