Iterative Methods for Vecchia-Laplace Approximations for Latent Gaussian Process Models

Pascal Kündig; Fabio Sigrist

Iterative Methods for Vecchia-Laplace Approximations for Latent Gaussian Process Models

Pascal Kündig, Fabio Sigrist

TL;DR

This paper addresses the computational bottlenecks of Vecchia-Laplace approximations for latent Gaussian process models with non-Gaussian likelihoods by introducing iterative methods based on preconditioned conjugate gradient and stochastic Lanczos quadrature. It proposes a suite of preconditioners (VADU, LVA, LRAC, ZIRC) and derives convergence guarantees, along with variance-reduction strategies for stochastic gradients. Two paths for predictive covariance computation are developed: simulation-based estimators and Lanczos-based approximations using specialized preconditioners, with theoretical error bounds. Empirical results on simulated and real-world data show substantial speedups (often an order of magnitude) and improved predictive performance relative to Cholesky-based approaches and prior Vecchia methods, all implemented in the GPBoost library; these advances enable scalable, accurate inference for Vecchia-Laplace GP models, though convergence still depends on covariance structure and fixed effects parameters.

Abstract

Latent Gaussian process (GP) models are flexible probabilistic non-parametric function models. Vecchia approximations are accurate approximations for GPs to overcome computational bottlenecks for large data, and the Laplace approximation is a fast method with asymptotic convergence guarantees to approximate marginal likelihoods and posterior predictive distributions for non-Gaussian likelihoods. Unfortunately, the computational complexity of combined Vecchia-Laplace approximations grows faster than linearly in the sample size when used in combination with direct solver methods such as the Cholesky decomposition. Computations with Vecchia-Laplace approximations can thus become prohibitively slow precisely when the approximations are usually the most accurate, i.e., on large data sets. In this article, we present iterative methods to overcome this drawback. Among other things, we introduce and analyze several preconditioners, derive new convergence results, and propose novel methods for accurately approximating predictive variances. We analyze our proposed methods theoretically and in experiments with simulated and real-world data. In particular, we obtain a speed-up of an order of magnitude compared to Cholesky-based calculations and a threefold increase in prediction accuracy in terms of the continuous ranked probability score compared to a state-of-the-art method on a large satellite data set. All methods are implemented in a free C++ software library with high-level Python and R packages.

Iterative Methods for Vecchia-Laplace Approximations for Latent Gaussian Process Models

TL;DR

Abstract

Paper Structure (45 sections, 3 theorems, 55 equations, 19 figures, 8 tables, 3 algorithms)

This paper contains 45 sections, 3 theorems, 55 equations, 19 figures, 8 tables, 3 algorithms.

Introduction
Relation to existing work
Vecchia-Laplace approximations
Vecchia approximations
Vecchia-Laplace approximations
Prediction with Vecchia-Laplace approximations
Iterative methods for Vecchia-Laplace approximations
Preconditioners
VADU and LVA preconditioners
LRAC preconditioner
ZIRC preconditioner
Other preconditioners
Convergence theory
Predictive covariance matrices
Predictive (co-)variances using simulation
...and 30 more sections

Key Result

Theorem 3.1

Let $u_{l+l'}$ denote the approximate solution of $(W+\tilde{\Sigma}^{-1})u = b$ in iteration $(l + l')$, $l,l'\in \mathbb{N}$ , $l<n$, of the preconditioned CG method, and let $\lambda_n(A)\leq \dots\leq \lambda_1(A)$ denote the eigenvalues of a symmetric matrix $A\in \mathbb{R}^{n\times n}$. The f where and $M= P_{\text{VADU}}^{-\frac{1}{2}}(W+\tilde{\Sigma}^{-1})P_{\text{VADU}}^{-\frac{T}{2}}$

Figures (19)

Figure 1: Estimated variance parameter $\sigma^2_1$ obtained with a Vecchia-Laplace approximation vs. different sample sizes $n$ for binary data. The red rhombi represent means and the whiskers are $\pm 2 \times$ standard errors. The dashed line indicates the true parameter $\sigma^2_1=1$.
Figure 2: Negative log-marginal likelihood and runtime for different preconditioners and numbers of random vectors $t$. The dashed lines are the results for the Cholesky decomposition.
Figure 3: Comparison of simulation- and Lanczos-based methods for predictive variances. The number of random vectors $s$ and the Lanczos rank $k$ are annotated in the plot.
Figure 4: Estimated marginal variance $\sigma_1^2$ and range $\rho$ parameter. The red rhombi represent means. The dotted lines indicate the true values.
Figure 5: RMSE for predictive means and log score (LS) for probabilistic predictions. For GPVecchia, the log score could not be calculated due to negative predictive variances.
...and 14 more figures

Theorems & Definitions (6)

Theorem 3.1
Theorem 3.2
Proposition 3.3
proof : Proof of Theorem \ref{['conv_VADU']}
proof : Proof of Theorem \ref{['acc_SLQ']}
proof : Proof of Proposition \ref{['pred_var_sim']}

Iterative Methods for Vecchia-Laplace Approximations for Latent Gaussian Process Models

TL;DR

Abstract

Iterative Methods for Vecchia-Laplace Approximations for Latent Gaussian Process Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (19)

Theorems & Definitions (6)