Table of Contents
Fetching ...

Vecchia-Inducing-Points Full-Scale Approximations for Gaussian Processes

Tim Gyger, Reinhard Furrer, Fabio Sigrist

TL;DR

This work addresses the scalability gap in Gaussian process inference by introducing Vecchia-inducing-points full-scale (VIF) approximations, which blend global inducing points with local Vecchia residuals to capture both large- and small-scale dependencies. It develops a correlation-based neighbor search via a modified cover tree and extends to non-Gaussian likelihoods using Laplace approximations, supported by iterative solvers and sophisticated preconditioners (VIFDU and FITC) to dramatically reduce compute relative to Cholesky. The authors provide convergence guarantees for the preconditioned CG method, propose efficient predictive-variance estimators, and validate the approach with extensive simulations and real-world datasets, showing superior accuracy and competitive runtimes versus state-of-the-art GP methods. The framework is implemented in GPBoost with Python/R interfaces, enabling practical deployment for large-scale GP problems across domains. Overall, VIF offers a robust, scalable, and accurate pathway for GP inference in both Gaussian and non-Gaussian settings with principled neighbor selection and iterative inference.

Abstract

Gaussian processes are flexible, probabilistic, non-parametric models widely used in machine learning and statistics. However, their scalability to large data sets is limited by computational constraints. To overcome these challenges, we propose Vecchia-inducing-points full-scale (VIF) approximations combining the strengths of global inducing points and local Vecchia approximations. Vecchia approximations excel in settings with low-dimensional inputs and moderately smooth covariance functions, while inducing point methods are better suited to high-dimensional inputs and smoother covariance functions. Our VIF approach bridges these two regimes by using an efficient correlation-based neighbor-finding strategy for the Vecchia approximation of the residual process, implemented via a modified cover tree algorithm. We further extend our framework to non-Gaussian likelihoods by introducing iterative methods that substantially reduce computational costs for training and prediction by several orders of magnitudes compared to Cholesky-based computations when using a Laplace approximation. In particular, we propose and compare novel preconditioners and provide theoretical convergence results. Extensive numerical experiments on simulated and real-world data sets show that VIF approximations are both computationally efficient as well as more accurate and numerically stable than state-of-the-art alternatives. All methods are implemented in the open source C++ library GPBoost with high-level Python and R interfaces.

Vecchia-Inducing-Points Full-Scale Approximations for Gaussian Processes

TL;DR

This work addresses the scalability gap in Gaussian process inference by introducing Vecchia-inducing-points full-scale (VIF) approximations, which blend global inducing points with local Vecchia residuals to capture both large- and small-scale dependencies. It develops a correlation-based neighbor search via a modified cover tree and extends to non-Gaussian likelihoods using Laplace approximations, supported by iterative solvers and sophisticated preconditioners (VIFDU and FITC) to dramatically reduce compute relative to Cholesky. The authors provide convergence guarantees for the preconditioned CG method, propose efficient predictive-variance estimators, and validate the approach with extensive simulations and real-world datasets, showing superior accuracy and competitive runtimes versus state-of-the-art GP methods. The framework is implemented in GPBoost with Python/R interfaces, enabling practical deployment for large-scale GP problems across domains. Overall, VIF offers a robust, scalable, and accurate pathway for GP inference in both Gaussian and non-Gaussian settings with principled neighbor selection and iterative inference.

Abstract

Gaussian processes are flexible, probabilistic, non-parametric models widely used in machine learning and statistics. However, their scalability to large data sets is limited by computational constraints. To overcome these challenges, we propose Vecchia-inducing-points full-scale (VIF) approximations combining the strengths of global inducing points and local Vecchia approximations. Vecchia approximations excel in settings with low-dimensional inputs and moderately smooth covariance functions, while inducing point methods are better suited to high-dimensional inputs and smoother covariance functions. Our VIF approach bridges these two regimes by using an efficient correlation-based neighbor-finding strategy for the Vecchia approximation of the residual process, implemented via a modified cover tree algorithm. We further extend our framework to non-Gaussian likelihoods by introducing iterative methods that substantially reduce computational costs for training and prediction by several orders of magnitudes compared to Cholesky-based computations when using a Laplace approximation. In particular, we propose and compare novel preconditioners and provide theoretical convergence results. Extensive numerical experiments on simulated and real-world data sets show that VIF approximations are both computationally efficient as well as more accurate and numerically stable than state-of-the-art alternatives. All methods are implemented in the open source C++ library GPBoost with high-level Python and R interfaces.

Paper Structure

This paper contains 35 sections, 9 theorems, 100 equations, 14 figures, 7 tables, 4 algorithms.

Key Result

Proposition 2.1

A VIF-approximated posterior predictive distribution for $p(\boldsymbol{y}^p|\boldsymbol{y},\boldsymbol{\theta})$, $\boldsymbol{y}^p=(y(\boldsymbol{s}^p_{ 1}), \ldots, y(\boldsymbol{s}^p_{n_p}))^\mathrm{T} \in \mathbb{R}^{n_p}$, is given by $\mathcal{N}(\boldsymbol{\mu}_\dagger^p,\mathbf{\Sigma}^p_\ where $\mathbf{\Sigma}_{mn_p} = [c_{\boldsymbol{\theta}}(\boldsymbol{s}_i^*, \boldsymbol{s}^p_{j})]

Figures (14)

  • Figure 1: Violin plots of the estimated variance parameter $\sigma_1^2$ for different sample sizes $n$ for binary data. Red dots indicate the mean estimates, and the dashed line marks the true parameter value $\sigma_1^2= 1$.
  • Figure 2: RMSE (log-scale), log-score (LS), and CRPS (log-scale) (mean $\pm$$2$ standard errors) for VIF ($m_v = 30$ & $m = 200$), FITC ($m = 200$), and Vecchia ($m_v = 30$) approximation for varying dimensions $d$ using a 3/2-Matérn kernel.
  • Figure 3: RMSE (log-scale), log-score (LS), and CRPS (log-scale) (mean $\pm$$2$ standard errors) for VIF ($m_v = 30$ & $m = 200$), FITC ($m = 200$), and Vecchia ($m_v = 30$) approximations for 1/2-Matérn, 3/2-Matérn, 5/2-Matérn, and Gaussian ($\infty$-Matérn) ARD kernels for $d = 10$.
  • Figure 4: Accuracy-runtime comparison of preconditioners: RMSE between log‑marginal likelihoods computed using iterative methods and those computed using a Cholesky decomposition versus runtime for the VIFDU and FITC preconditioners and varying numbers of sample vectors $\ell$ (annotated in the plot). A binary likelihood, $n = 100'000$ samples, and three different VIF approximations are used.
  • Figure 5: Accuracy-runtime comparison of simulation- and iterative-methods-based predictive variances (SBPV and SPV algorithms with the FITC and VIFDU preconditioners): RMSE between predictive variances computed using simulation-based methods and those computed using a Cholesky decomposition versus runtime for a binary likelihood. The number of random vectors $\ell$ is annotated in the plot.
  • ...and 9 more figures

Theorems & Definitions (18)

  • Proposition 2.1
  • proof : Proof of Proposition \ref{['pred_dist_VIF']}
  • Proposition 3.1
  • proof : Proof of Proposition \ref{['pred_dist_VIFLaplace']}
  • Proposition 4.1
  • Proposition 4.2
  • Theorem 5.1
  • Theorem 5.2
  • proof : Proof of Proposition \ref{['PropPredVar']}
  • proof : Proof of Proposition \ref{['PropPredVar2']}
  • ...and 8 more