Table of Contents
Fetching ...

gp2Scale: A Class of Compactly-Supported Non-Stationary Kernels and Distributed Computing for Exact Gaussian Processes on 10 Million Data Points

Marcus M. Noack, Mark D. Risser, Hengrui Luo, Vardaan Tekriwal, Ronald J. Pandolfi

TL;DR

This work tackles the scalability of exact Gaussian process regression to millions of data points by introducing gp2Scale, a framework built on three pillars: (i) flexible non-stationary, compactly supported kernels that reveal sparse covariance structure, (ii) distributed computing to efficiently assemble and manipulate large covariance blocks, and (iii) a block-MCMC scheme for rapid, regularized hyperparameter sampling. The authors present a family of kernels, including non-stationary Wendland- and bump-function-based constructions, with mechanisms to preserve far-field interactions while maintaining sparsity. Through extensive experiments on synthetic, topography, housing, MNIST, and a 10-million-temperature dataset, gp2Scale demonstrates competitive or superior accuracy and uncertainty quantification compared to leading approximations (SVGP, VNNGP, SKI, Vecchia), while enabling exact inference. The results, supported by a public gpCAM package and reproducibility commitments, suggest gp2Scale is especially advantageous for highly customizable GPs on dense, non-stationary data where traditional approximations may overly constrain kernel design or uncertainty modeling.

Abstract

Despite a large corpus of recent work on scaling up Gaussian processes, a stubborn trade-off between computational speed, prediction and uncertainty quantification accuracy, and customizability persists. This is because the vast majority of existing methodologies exploit various levels of approximations that lower accuracy and limit the flexibility of kernel and noise-model designs -- an unacceptable drawback at a time when expressive non-stationary kernels are on the rise in many fields. Here, we propose a methodology we term \emph{gp2Scale} that scales exact Gaussian processes to more than 10 million data points without relying on inducing points, kernel interpolation, or neighborhood-based approximations, and instead leveraging the existing capabilities of a GP: its kernel design. Highly flexible, compactly supported, and non-stationary kernels lead to the identification of naturally occurring sparse structure in the covariance matrix, which is then exploited for the calculations of the linear system solution and the log-determinant for training. We demonstrate our method's functionality on several real-world datasets and compare it with state-of-the-art approximation algorithms. Although we show superior approximation performance in many cases, the method's real power lies in its agnosticism toward arbitrary GP customizations -- core kernel design, noise, and mean functions -- and the type of input space, making it optimally suited for modern Gaussian process applications.

gp2Scale: A Class of Compactly-Supported Non-Stationary Kernels and Distributed Computing for Exact Gaussian Processes on 10 Million Data Points

TL;DR

This work tackles the scalability of exact Gaussian process regression to millions of data points by introducing gp2Scale, a framework built on three pillars: (i) flexible non-stationary, compactly supported kernels that reveal sparse covariance structure, (ii) distributed computing to efficiently assemble and manipulate large covariance blocks, and (iii) a block-MCMC scheme for rapid, regularized hyperparameter sampling. The authors present a family of kernels, including non-stationary Wendland- and bump-function-based constructions, with mechanisms to preserve far-field interactions while maintaining sparsity. Through extensive experiments on synthetic, topography, housing, MNIST, and a 10-million-temperature dataset, gp2Scale demonstrates competitive or superior accuracy and uncertainty quantification compared to leading approximations (SVGP, VNNGP, SKI, Vecchia), while enabling exact inference. The results, supported by a public gpCAM package and reproducibility commitments, suggest gp2Scale is especially advantageous for highly customizable GPs on dense, non-stationary data where traditional approximations may overly constrain kernel design or uncertainty modeling.

Abstract

Despite a large corpus of recent work on scaling up Gaussian processes, a stubborn trade-off between computational speed, prediction and uncertainty quantification accuracy, and customizability persists. This is because the vast majority of existing methodologies exploit various levels of approximations that lower accuracy and limit the flexibility of kernel and noise-model designs -- an unacceptable drawback at a time when expressive non-stationary kernels are on the rise in many fields. Here, we propose a methodology we term \emph{gp2Scale} that scales exact Gaussian processes to more than 10 million data points without relying on inducing points, kernel interpolation, or neighborhood-based approximations, and instead leveraging the existing capabilities of a GP: its kernel design. Highly flexible, compactly supported, and non-stationary kernels lead to the identification of naturally occurring sparse structure in the covariance matrix, which is then exploited for the calculations of the linear system solution and the log-determinant for training. We demonstrate our method's functionality on several real-world datasets and compare it with state-of-the-art approximation algorithms. Although we show superior approximation performance in many cases, the method's real power lies in its agnosticism toward arbitrary GP customizations -- core kernel design, noise, and mean functions -- and the type of input space, making it optimally suited for modern Gaussian process applications.

Paper Structure

This paper contains 35 sections, 1 theorem, 13 equations, 3 figures, 5 tables.

Key Result

Theorem 1

Let $k( x _1, x _2)$ be a valid kernel, then $f( x _1)f( x _2)k( x _1, x _2)$ is also a valid kernel. Here, $f( x )$ is an arbitrary function over the input set.

Figures (3)

  • Figure 1: Approximation performance of VNNGP, SVGP, SKI, Vecchia, gp2Scale, and a base regular GP for comparison, which is, in most examples we consider, computationally prohibitive. The ground truth is depicted in blue; red dots represent the training data, and the posterior mean is shown in black. VNNGP, SVGP, and SKI oversmooth and cannot adequately recover local oscillations. The Vecchia approximation shows some local oscillation, although degraded. gp2Scale best preserves local variations and sharp transitions, and offers reliable uncertainty quantification (UQ) very similar to the base regular GP. Note that for all figures the posterior standard deviation of $p( \textbf{f} )$ is displayed, not of $p( \textbf{y} )$, leading to vanishingly small uncertainties for the regular GP and gp2Scale, which is expected given the dense data distribution. The competing methods are clearly overestimating the uncertainty. The noise, in this example, is homoscedastic and constant across the tested methodologies.
  • Figure 2: Compute pipeline. The dataset is divided into approximately equal chunks. Those chunks are sent to (DASK) compute workers, which calculate a dense square block of the covariance matrix. The block will be cast to sparse COO format before being shipped back to the host, where all blocks will be collected and assembled into the full sparse covariance matrix. This simple pipeline allows us to calculate truly massive covariance matrices in a reasonable amount of time. The kernels presented in this paper allow the discovery of sparsity in the covariance matrix, enabling downstream operations to be comparably cheap.
  • Figure 3: Graphical illustration of the covariance matrix for a one-dimensional problem using kernel \ref{['eq:bump_kernel']} for $U=1$ and $k_{core} = 1$, In particular, the far-field term $g( x _i) g( x _j)$ has rank 1 and therefore the associated interactions can only be turned "on" or "off" which is highlighted by a box pattern in the covariance matrix

Theorems & Definitions (2)

  • Theorem 1
  • proof