Gaussian process regression with log-linear scaling for common non-stationary kernels

P. Michael Kielstra; Michael Lindsey

Gaussian process regression with log-linear scaling for common non-stationary kernels

P. Michael Kielstra, Michael Lindsey

TL;DR

The paper develops a fast, kernel-matrix-vector multiplication framework for Gaussian process regression with non-stationary kernels in low dimensions, extending the equispaced Fourier GP approach to spatially varying scales via the Schoenberg representation. By decomposing the kernel into a sum of Gaussian convolutions and employing Chebyshev interpolation in σ together with NUFFT-based Fourier discretization, it achieves near-linear matvec cost in N with controllable exponential convergence in tunable parameters. The authors provide a rigorous error analysis, showing overall error ||𝒦−˜𝒦|| decays as a combination of discretization and interpolation errors, and demonstrate the approach with numerical experiments that exhibit favorable scaling against state-of-the-art rank-structured methods. The method enables efficient CG-based solves for GPR with non-stationary Matérn kernels and delivers practical improvements in scalability for multi-dimensional problems, potentially benefiting applications requiring flexible, non-stationary kernel design.

Abstract

We introduce a fast algorithm for Gaussian process regression in low dimensions, applicable to a widely-used family of non-stationary kernels. The non-stationarity of these kernels is induced by arbitrary spatially-varying vertical and horizontal scales. In particular, any stationary kernel can be accommodated as a special case, and we focus especially on the generalization of the standard Matérn kernel. Our subroutine for kernel matrix-vector multiplications scales almost optimally as $O(N\log N)$, where $N$ is the number of regression points. Like the recently developed equispaced Fourier Gaussian process (EFGP) methodology, which is applicable only to stationary kernels, our approach exploits non-uniform fast Fourier transforms (NUFFTs). We offer a complete analysis controlling the approximation error of our method, and we validate the method's practical performance with numerical experiments. In particular we demonstrate improved scalability compared to to state-of-the-art rank-structured approaches in spatial dimension $d>1$.

Gaussian process regression with log-linear scaling for common non-stationary kernels

TL;DR

Abstract

, where

is the number of regression points. Like the recently developed equispaced Fourier Gaussian process (EFGP) methodology, which is applicable only to stationary kernels, our approach exploits non-uniform fast Fourier transforms (NUFFTs). We offer a complete analysis controlling the approximation error of our method, and we validate the method's practical performance with numerical experiments. In particular we demonstrate improved scalability compared to to state-of-the-art rank-structured approaches in spatial dimension

Paper Structure (27 sections, 7 theorems, 150 equations, 9 figures)

This paper contains 27 sections, 7 theorems, 150 equations, 9 figures.

Introduction
Outline
Acknowledgments
Preliminaries
Approximation of the kernel
Analytical integration in $t$
Numerical integration in $t$
Chebyshev interpolation in $\sigma$
Fourier discretization
Summary
Fast algorithm for kernel matrix-vector multiplication
Derivation
Explicit algorithm and cost scaling
Error analysis
Stage (1)
...and 12 more sections

Key Result

Lemma 3

For any $N\times N$ matrix $A$ and any $p\in[1,\infty]$, we have $\Vert A\Vert_{p}\leq N\,\vert\!\vert\!\vert A\vert\!\vert\!\vert$.

Figures (9)

Figure 6.1: Matvec time and accuracy in one dimension for the Matérn kernel with $\nu=\frac{3}{2}$. Unless otherwise specified, $N_{t}=20$, $N_{\sigma}=20$, and $M=400$. Results are averaged over 50 runs.
Figure 6.2: Matvec time and accuracy in two dimensions for the Matérn kernel with $\nu=\frac{3}{2}$. Unless otherwise specified, $N_{t}=20$, $N_{\sigma}=20$, and $M=200$ ($M^{2}=40\,000$). Results are averaged over 3 runs.
Figure 6.3: Matvec time and accuracy with varying $N$ in $d=1$ (left) and $d=2$ (right) for the Matérn kernel with $\nu=\frac{3}{2}$. The reference values for $N_{t}$, $N_{\sigma}$, and $M$ are the same as in Figures \ref{['fig:matvec1d']} and \ref{['fig:matvec2d']}, respectively, as are the numbers of trial runs.
Figure 6.4: Matvec time and accuracy in one dimension for the squared-exponential kernel. Unless otherwise specified, $N_{\sigma}=26$ and $M=100$. Results are averaged over 5 runs.
Figure 6.5: Matvec time and accuracy in two dimensions for the squared-exponential kernel. Unless otherwise specified, $N_{\sigma}=26$ and $M=140$. Results are averaged over 3 runs.
...and 4 more figures

Theorems & Definitions (21)

Remark 1
Definition 2
Lemma 3
Lemma 4
Definition 5
Lemma 6
Definition 7
Definition 8
Definition 9
Lemma 10
...and 11 more

Gaussian process regression with log-linear scaling for common non-stationary kernels

TL;DR

Abstract

Gaussian process regression with log-linear scaling for common non-stationary kernels

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (21)