Table of Contents
Fetching ...

Preconditioned Additive Gaussian Processes with Fourier Acceleration

Theresa Wagner, Tianshi Xu, Franziska Nestler, Yuanzhe Xi, Martin Stoll

TL;DR

This work tackles the computational bottlenecks of Gaussian processes by combining a matrix-free, NFFT-based acceleration for kernel-vector multiplications with an additive-kernel construction that reduces effective dimensionality. An AAFN preconditioner tailored to additive kernels speeds up hyperparameter optimization and stabilizes stochastic trace estimates, enabling scalable GP training on higher-dimensional data. The authors provide rigorous Fourier-approximation error bounds for Matérn kernels and show that the derivative computations used in learning are consistent with the Fourier approximations. Numerical experiments on synthetic and real data demonstrate comparable predictive performance and uncertainty quantification to exact methods, with substantial gains in efficiency and scalability.

Abstract

Gaussian processes (GPs) are crucial in machine learning for quantifying uncertainty in predictions. However, their associated covariance matrices, defined by kernel functions, are typically dense and large-scale, posing significant computational challenges. This paper introduces a matrix-free method that utilizes the Non-equispaced Fast Fourier Transform (NFFT) to achieve nearly linear complexity in the multiplication of kernel matrices and their derivatives with vectors for a predetermined accuracy level. To address high-dimensional problems, we propose an additive kernel approach. Each sub-kernel in this approach captures lower-order feature interactions, allowing for the efficient application of the NFFT method and potentially increasing accuracy across various real-world datasets. Additionally, we implement a preconditioning strategy that accelerates hyperparameter tuning, further improving the efficiency and effectiveness of GPs.

Preconditioned Additive Gaussian Processes with Fourier Acceleration

TL;DR

This work tackles the computational bottlenecks of Gaussian processes by combining a matrix-free, NFFT-based acceleration for kernel-vector multiplications with an additive-kernel construction that reduces effective dimensionality. An AAFN preconditioner tailored to additive kernels speeds up hyperparameter optimization and stabilizes stochastic trace estimates, enabling scalable GP training on higher-dimensional data. The authors provide rigorous Fourier-approximation error bounds for Matérn kernels and show that the derivative computations used in learning are consistent with the Fourier approximations. Numerical experiments on synthetic and real data demonstrate comparable predictive performance and uncertainty quantification to exact methods, with substantial gains in efficiency and scalability.

Abstract

Gaussian processes (GPs) are crucial in machine learning for quantifying uncertainty in predictions. However, their associated covariance matrices, defined by kernel functions, are typically dense and large-scale, posing significant computational challenges. This paper introduces a matrix-free method that utilizes the Non-equispaced Fast Fourier Transform (NFFT) to achieve nearly linear complexity in the multiplication of kernel matrices and their derivatives with vectors for a predetermined accuracy level. To address high-dimensional problems, we propose an additive kernel approach. Each sub-kernel in this approach captures lower-order feature interactions, allowing for the efficient application of the NFFT method and potentially increasing accuracy across various real-world datasets. Additionally, we implement a preconditioning strategy that accelerates hyperparameter tuning, further improving the efficiency and effectiveness of GPs.

Paper Structure

This paper contains 20 sections, 4 theorems, 63 equations, 8 figures, 3 tables.

Key Result

Lemma 4.2

\newlabellemma10 For $\bm r\in[-\frac{1}{2},\frac{1}{2})^3$ and the trivariate Matérn kernel $\kappa^\mathrm{m}$, we have where the quantity $\delta^\mathrm{m}(\ell)$ is given by which becomes negligibly small for small $\ell$.

Figures (8)

  • Figure 1: Left: Iteration counts of unpreconditioned CG to solve linear systems for $20$ regularized additive Gaussian kernel matrices with the same random right-hand side to reach a relative residual tolerance $10^{-3}$. These $20$ matrices are associated with the same $1000$ points in $\mathbb{R}^6$ and fixed $\sigma_f^2=\frac{1}{P}$, $\sigma_\varepsilon^2=0.01$, but different length-scales $\ell$. The six features are split into three two-dimensional windows. Each window is sampled randomly within a circle of radius $\sqrt{\frac{1000}{\pi}}$. Right: Spectra of the $20$ regularized additive Gaussian kernel matrices.
  • Figure 1: \newlabelfig:kernel_approx0 Visualization in 1D: The original kernel function $\kappa$ (left), the periodically continued kernel function $\kappa_\textbf{R}$ (middle) and its Fourier approximation $\kappa_\text{RF}$ (right). The Fourier approximation $\kappa_\text{RF}$ is a trigonometric polynomial interpolating $m$ (here $m=8$) equidistant samples of the kernel function (dots).
  • Figure 1: Visualization in 1D: The original kernel function $\kappa(r)={\mathrm e}^{-|r|/\ell}$ with $\ell=0.2$ (left) and its 1-periodic periodization $\tilde{\kappa}$ (right). The Fourier coefficients of the periodization are given by the Fourier transform of $\kappa$, evaluated at integer values. The difference between $\kappa$ and $\tilde{\kappa}$ is small when $\ell$ remains small. \newlabelfig:1periodization0
  • Figure 1: Comparison of iteration counts for CG vs AAFN preconditioned CG to reach $10^{-4}$ relative residual tolerance, as a function of length scale $\ell$. Plotted for Gaussian and Matérn kernels using a synthetic $\mathbb{R}^6$ dataset formed with $3000$ points sampled uniformly at random within a hypercube of side length $\sqrt[3]{3000}$. The right-hand side vector elements were sampled uniformly from $[-0.5, 0.5]$, using a zero initial guess.
  • Figure 2: Comparison of the measured true Fourier approximation errors (solid lines) for the periodically continued kernels $\kappa_{\text{R}}$ and the corresponding error estimators (dashed lines) as stated in Theorems \ref{['theorem:Fourier_Error_Matern_3d']} and \ref{['theorem:Fourier_Error_derMatern_3d']} for the periodized kernels $\tilde{\kappa}$ in three dimensions. The results for the Matérn kernel are depicted in the first row and for the derivative Matérn kernel in the second row of the plot. The grid size $m$ is fixed to $m=16$, $m=32$, or $m=64$ (from left to right). \newlabelfig:error_estimates_mat3d0
  • ...and 3 more figures

Theorems & Definitions (10)

  • Definition 4.1: $1$-Periodic Periodization
  • Lemma 4.2
  • Proof 1
  • Lemma 4.3
  • Proof 2
  • Theorem 4.4: Fourier Error Estimate for the Periodized Trivariate Matérn Kernel
  • Proof 3
  • Theorem 4.5: Fourier Error Estimate for the Periodized Trivariate Derivative Matérn Kernel
  • Proof 4
  • Remark 4.6