Table of Contents
Fetching ...

Fast Evaluation of Additive Kernels: Feature Arrangement, Fourier Methods, and Kernel Derivatives

Theresa Wagner, Franziska Nestler, Martin Stoll

TL;DR

This work tackles the computational bottleneck of dense kernel matrices by combining feature-space partitioning with non-equispaced fast Fourier transforms (NFFT) to accelerate additive Gaussian kernels and their derivatives. It introduces a flexible framework for arranging features into low-dimensional windows, backed by a spectrum of selection, regularization, clustering, and optimization strategies, with an explicit NFFT-based method for both kernel evaluations and derivative computations. The authors derive rigorous Fourier error bounds for the Gaussian and derivative kernels, and develop a global-sensitivity-analysis approach in the Fourier domain to guide window construction. Numerical experiments on large-scale benchmarks show that additive-kernel ridge regression with NFFT acceleration can outperform full-kernel baselines in both accuracy (RMSE) and efficiency, and extendable to Matérn-type kernels. The methods promise scalable, interpretable additive models suitable for GP/SVM contexts, with practical impact in large data regimes and hyperparameter optimization.

Abstract

One of the main computational bottlenecks when working with kernel based learning is dealing with the large and typically dense kernel matrix. Techniques dealing with fast approximations of the matrix vector product for these kernel matrices typically deteriorate in their performance if the feature vectors reside in higher-dimensional feature spaces. We here present a technique based on the non-equispaced fast Fourier transform (NFFT) with rigorous error analysis. We show that this approach is also well suited to allow the approximation of the matrix that arises when the kernel is differentiated with respect to the kernel hyperparameters; a problem often found in the training phase of methods such as Gaussian processes. We also provide an error analysis for this case. We illustrate the performance of the additive kernel scheme with fast matrix vector products on a number of data sets. Our code is available at https://github.com/wagnertheresa/NFFTAddKer

Fast Evaluation of Additive Kernels: Feature Arrangement, Fourier Methods, and Kernel Derivatives

TL;DR

This work tackles the computational bottleneck of dense kernel matrices by combining feature-space partitioning with non-equispaced fast Fourier transforms (NFFT) to accelerate additive Gaussian kernels and their derivatives. It introduces a flexible framework for arranging features into low-dimensional windows, backed by a spectrum of selection, regularization, clustering, and optimization strategies, with an explicit NFFT-based method for both kernel evaluations and derivative computations. The authors derive rigorous Fourier error bounds for the Gaussian and derivative kernels, and develop a global-sensitivity-analysis approach in the Fourier domain to guide window construction. Numerical experiments on large-scale benchmarks show that additive-kernel ridge regression with NFFT acceleration can outperform full-kernel baselines in both accuracy (RMSE) and efficiency, and extendable to Matérn-type kernels. The methods promise scalable, interpretable additive models suitable for GP/SVM contexts, with practical impact in large data regimes and hyperparameter optimization.

Abstract

One of the main computational bottlenecks when working with kernel based learning is dealing with the large and typically dense kernel matrix. Techniques dealing with fast approximations of the matrix vector product for these kernel matrices typically deteriorate in their performance if the feature vectors reside in higher-dimensional feature spaces. We here present a technique based on the non-equispaced fast Fourier transform (NFFT) with rigorous error analysis. We show that this approach is also well suited to allow the approximation of the matrix that arises when the kernel is differentiated with respect to the kernel hyperparameters; a problem often found in the training phase of methods such as Gaussian processes. We also provide an error analysis for this case. We illustrate the performance of the additive kernel scheme with fast matrix vector products on a number of data sets. Our code is available at https://github.com/wagnertheresa/NFFTAddKer
Paper Structure (30 sections, 3 theorems, 111 equations, 13 figures, 1 table, 1 algorithm)

This paper contains 30 sections, 3 theorems, 111 equations, 13 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

\newlabellemma: aliasing error0 (Aliasing error) Let $f \in L_2 (\mathbb{T}^3)$ be a function with absolutely convergent Fourier series and let an approximation of $f$ be given by (replacing the analytic Fourier coefficients by the discrete Fourier coefficients using $m$ equidistant samples on the grid ${\mathcal{I}_m}$) Then and the approximation error can be estimated for all $\bm{x} \in \ma

Figures (13)

  • Figure 1: An even function of the form $f(r)=\mathrm{exp}(-r^2/2\ell^2)$ (left) or $f(r)=\mathrm{exp}(-r^2/(2\ell^2))\cdot(r^2/2\ell^2)$ (right), defined on $[-1/2,1/2]$, is periodized via simple periodic continuation. The resulting periodic function is at least continuous, but in general not smooth. A finite number of approximating Fourier coefficients can be obtained by sampling the function in equidistant points (marked by the dots) and applying the FFT. Alternatively, one can make use of the analytic Fourier coefficients, provided they are known. \newlabelfig:periodization0
  • Figure 1: Comparison of RMSE, window size and runtime for the additive KRR model for different GSI scores with $N=1000$, $d_{\text{max}}=3$, $N_{\text{feat}}=d$ and initial $\ell=1$ and $\beta=1$.
  • Figure 1: Comparison of RMSE, window setup time and time for fitting and predicting the additive KRR model with the corresponding windows for different feature arrangement techniques and strategies, fixed number of total features included $N_{\text{feat}}=2d/3$ and different maximal window length $d_{\text{max}}$ for the KEGGundir data set.
  • Figure 2: RMSE surface for additive kernel ridge regression and different length-scale and regularization parameters $\ell$ and $\beta$, where $N=1000$ and the windows are determined consecutively via MIS ranking.
  • Figure 2: Comparison of RMSE, window setup time and time for fitting and predicting the additive KRR model with the corresponding windows for different feature arrangement techniques and strategies, fixed $d_{\text{max}}=3$ and different number of total features included $N_{\text{feat}}$ for the KEGGundir data set.
  • ...and 8 more figures

Theorems & Definitions (8)

  • Lemma 1
  • Proof 1
  • Theorem 2
  • Proof 2
  • Remark 3
  • Theorem 4
  • Proof 3
  • Remark 5