Table of Contents
Fetching ...

Iterative Methods for Full-Scale Gaussian Process Approximations for Large Spatial Data

Tim Gyger, Reinhard Furrer, Fabio Sigrist

TL;DR

This work tackles the scalability of Gaussian process inference for large spatial data by blending full-scale approximations with covariance tapering (FSA) and developing fast iterative solvers. It introduces a novel FITC preconditioner that dramatically accelerates conjugate gradient convergence and reduces sensitivity to FSA parameters, while enabling efficient log-determinant and gradient computations via stochastic estimators; a fast, simulation-based method yields accurate predictive variances. The authors provide theoretical convergence guarantees and validate them with extensive simulations, showing that the iterative approach matches Cholesky-based accuracy but with substantial speedups, and they extend the methodology to Vecchia approximations. A real-world MODIS terra data study demonstrates practical impact, with iterative inference delivering comparable results to exact methods at roughly an order of magnitude faster computation, and the software is released in open-source C++ with Python/R interfaces.

Abstract

Gaussian processes are flexible probabilistic regression models which are widely used in statistics and machine learning. However, a drawback is their limited scalability to large data sets. To alleviate this, full-scale approximations (FSAs) combine predictive process methods and covariance tapering, thus approximating both global and local structures. We show how iterative methods can be used to reduce computational costs in calculating likelihoods, gradients, and predictive distributions with FSAs. In particular, we introduce a novel preconditioner and show theoretically and empirically that it accelerates the conjugate gradient method's convergence speed and mitigates its sensitivity with respect to the FSA parameters and the eigenvalue structure of the original covariance matrix, and we demonstrate empirically that it outperforms a state-of-the-art pivoted Cholesky preconditioner. Furthermore, we introduce an accurate and fast way to calculate predictive variances using stochastic simulation and iterative methods. In addition, we show how our newly proposed fully independent training conditional (FITC) preconditioner can also be used in iterative methods for Vecchia approximations. In our experiments, it outperforms existing state-of-the-art preconditioners for Vecchia approximations. All methods are implemented in a free C++ software library with high-level Python and R packages.

Iterative Methods for Full-Scale Gaussian Process Approximations for Large Spatial Data

TL;DR

This work tackles the scalability of Gaussian process inference for large spatial data by blending full-scale approximations with covariance tapering (FSA) and developing fast iterative solvers. It introduces a novel FITC preconditioner that dramatically accelerates conjugate gradient convergence and reduces sensitivity to FSA parameters, while enabling efficient log-determinant and gradient computations via stochastic estimators; a fast, simulation-based method yields accurate predictive variances. The authors provide theoretical convergence guarantees and validate them with extensive simulations, showing that the iterative approach matches Cholesky-based accuracy but with substantial speedups, and they extend the methodology to Vecchia approximations. A real-world MODIS terra data study demonstrates practical impact, with iterative inference delivering comparable results to exact methods at roughly an order of magnitude faster computation, and the software is released in open-source C++ with Python/R interfaces.

Abstract

Gaussian processes are flexible probabilistic regression models which are widely used in statistics and machine learning. However, a drawback is their limited scalability to large data sets. To alleviate this, full-scale approximations (FSAs) combine predictive process methods and covariance tapering, thus approximating both global and local structures. We show how iterative methods can be used to reduce computational costs in calculating likelihoods, gradients, and predictive distributions with FSAs. In particular, we introduce a novel preconditioner and show theoretically and empirically that it accelerates the conjugate gradient method's convergence speed and mitigates its sensitivity with respect to the FSA parameters and the eigenvalue structure of the original covariance matrix, and we demonstrate empirically that it outperforms a state-of-the-art pivoted Cholesky preconditioner. Furthermore, we introduce an accurate and fast way to calculate predictive variances using stochastic simulation and iterative methods. In addition, we show how our newly proposed fully independent training conditional (FITC) preconditioner can also be used in iterative methods for Vecchia approximations. In our experiments, it outperforms existing state-of-the-art preconditioners for Vecchia approximations. All methods are implemented in a free C++ software library with high-level Python and R packages.
Paper Structure (34 sections, 6 theorems, 77 equations, 14 figures, 6 tables, 2 algorithms)

This paper contains 34 sections, 6 theorems, 77 equations, 14 figures, 6 tables, 2 algorithms.

Key Result

Proposition 3.1

Algorithm alg:pred_var produces unbiased and consistent estimates $\boldsymbol{D}^p$ of the predictive variance $\text{diag}(\boldsymbol{\Sigma}^p_\dagger)$ given in MuPredFSA.

Figures (14)

  • Figure 1: Box-plots of the negative log-likelihood for the FITC approximation for different effective ranges (0.5, 0.2, 0.05 from left to right) and numbers of inducing points $m$ ($n = 100'000$).
  • Figure 2: Number of iterations (in log-scale) used in the CG method with and without the FITC preconditioner for calculating $\Tilde{\boldsymbol{\Sigma}}_{\dagger}^{-1}\boldsymbol{y}$ for simulated data with an effective range of 0.2. Left: Different sample sizes $n$ with constant $n_\gamma = 80$ and $m = 500$. Middle: Different taper ranges $\gamma$ with constant $n = 100'000$ and $m = 500$. Right: Different numbers of inducing points $m$ with constant $n = 100'000$ and $n_\gamma = 80$.
  • Figure 3: Comparison of the Lanczos and stochastic estimation methods for predictive variances when using an effective range of 0.2. The dashed black line corresponds to the computations based on Cholesky decomposition. The numbers next to the points correspond to the number of sample vectors or the rank, respectively. For the stochastic approach, the respective mean is shown ($n_p = 100'000$, $n = 100'000$, $m = 500$, $n_\gamma = 80$).
  • Figure 4: Time (s) for computing the negative log-likelihood using a Cholesky decompostion and iterative methods for simulated data for varying samples sizes $n$, taper ranges $\gamma$, and numbers of inducing points $m$.
  • Figure 5: Box-plots of the relative error of the negative log-likelihood computed with and without the FITC preconditioner on simulated random fields with effective ranges of 0.5, 0.2, and 0.05, respectively, for the true population parameters ($n = 100'000$, $n_\gamma = 80$, $m = 500$).
  • ...and 9 more figures

Theorems & Definitions (13)

  • Proposition 3.1
  • Theorem 3.2
  • Theorem 3.3
  • proof : Proof of Proposition \ref{['PropPredVar']}
  • Definition G.1
  • Lemma G.1
  • proof
  • Lemma G.2
  • proof
  • proof : Proof of Theorem \ref{['thm1']}
  • ...and 3 more