Table of Contents
Fetching ...

Fast Summation of Radial Kernels via QMC Slicing

Johannes Hertrich, Tim Jahn, Michael Quellmalz

TL;DR

This paper tackles fast computation of large radial kernel sums s_m = \\sum_{n=1}^N w_n K(x_n, y_m) by projecting data onto P directions on the sphere and approximating K via a 1D basis f, enabling fast 1D summations. The core idea, slicing, is enhanced with quasi-Monte Carlo designs on the sphere to improve convergence beyond the standard O(1/\\sqrt{P}) rate, supported by smoothness results and exact variance calculations for several kernels. The authors prove error bounds for uniformly distributed slices and show that certain kernels (Gauss, Laplace, Matérn, Riesz, and thin plate spline) admit dimension-independent or favorable rate bounds, with QMC designs achieving O(P^{-s/(d-1)}) rates in suitable Sobolev spaces. Extensive numerical experiments demonstrate that QMC-slicing substantially outperforms (QMC-)Random Fourier Features, orthogonal Fourier features, and non-QMC slicing on common datasets, particularly in moderate dimensions (d up to ~100) and for smooth kernels. The approach offers a flexible, scalable framework for fast kernel summation, including non-positive definite kernels, with practical impact in kernel methods, MMD flows, and related high-dimensional data analysis tasks.

Abstract

The fast computation of large kernel sums is a challenging task, which arises as a subproblem in any kernel method. We approach the problem by slicing, which relies on random projections to one-dimensional subspaces and fast Fourier summation. We prove bounds for the slicing error and propose a quasi-Monte Carlo (QMC) approach for selecting the projections based on spherical quadrature rules. Numerical examples demonstrate that our QMC-slicing approach significantly outperforms existing methods like (QMC-)random Fourier features, orthogonal Fourier features or non-QMC slicing on standard test datasets.

Fast Summation of Radial Kernels via QMC Slicing

TL;DR

This paper tackles fast computation of large radial kernel sums s_m = \\sum_{n=1}^N w_n K(x_n, y_m) by projecting data onto P directions on the sphere and approximating K via a 1D basis f, enabling fast 1D summations. The core idea, slicing, is enhanced with quasi-Monte Carlo designs on the sphere to improve convergence beyond the standard O(1/\\sqrt{P}) rate, supported by smoothness results and exact variance calculations for several kernels. The authors prove error bounds for uniformly distributed slices and show that certain kernels (Gauss, Laplace, Matérn, Riesz, and thin plate spline) admit dimension-independent or favorable rate bounds, with QMC designs achieving O(P^{-s/(d-1)}) rates in suitable Sobolev spaces. Extensive numerical experiments demonstrate that QMC-slicing substantially outperforms (QMC-)Random Fourier Features, orthogonal Fourier features, and non-QMC slicing on common datasets, particularly in moderate dimensions (d up to ~100) and for smooth kernels. The approach offers a flexible, scalable framework for fast kernel summation, including non-positive definite kernels, with practical impact in kernel methods, MMD flows, and related high-dimensional data analysis tasks.

Abstract

The fast computation of large kernel sums is a challenging task, which arises as a subproblem in any kernel method. We approach the problem by slicing, which relies on random projections to one-dimensional subspaces and fast Fourier summation. We prove bounds for the slicing error and propose a quasi-Monte Carlo (QMC) approach for selecting the projections based on spherical quadrature rules. Numerical examples demonstrate that our QMC-slicing approach significantly outperforms existing methods like (QMC-)random Fourier features, orthogonal Fourier features or non-QMC slicing on standard test datasets.
Paper Structure (52 sections, 4 theorems, 95 equations, 12 figures, 6 tables)

This paper contains 52 sections, 4 theorems, 95 equations, 12 figures, 6 tables.

Key Result

Theorem 1

Let $F\colon\mathbb{R}_{\ge 0}\to \mathbb{R}$ and $f\colon\mathbb{R}_{\ge 0}\to \mathbb{R}$ fulfill the slicing relation equation eq:sliced_basis_function.

Figures (12)

  • Figure 1: Loglog plot of the approximation error $|F(\|x\|)-\frac{1}{P}\sum_p f(|\langle\xi_p,x\rangle|)|$ for approximating the function $F$ by slicing equation \ref{['eq:approximation']} versus the number $P$ of projections (or the number $D=P$ of features for RFF and ORF) for different kernels and dimensions (left $d=3$, middle $d=10$, right $d=50$). The results are averaged over $50$ realizations of ${\boldsymbol{\xi}}^P$ and $1000$ realizations of $x$. The kernel parameters are set by the median rule with scaling factor $\gamma=1$. We fit a regression line in the loglog plot for each method to estimate the convergence rate $r$, see also Table \ref{['tab:rates']}.
  • Figure 2: Loglog plot of the relative $L^1$ approximation error versus computation time for computing the kernel summations equation \ref{['eq:kernel_sum']} with different kernels and methods. We use the Letters dataset ($M=N=20000$ points), MNIST (reduced to dimension $d=20$ via PCA, $M=N=60000$ points) and FashionMNIST (reduced to dimension $d=30$ via PCA, $M=N=60000$ points). We run each method $10$ times. The shaded area indicates the standard deviation of the error. For Fourier slicing, we use $P=5\cdot 2^k$ slices for $k=1,...,10$. In order to obtain similar computation times, we use $5\cdot 2^{k-1}$ slices for RFF-$10$ slicing and $D=2P$ features for RFF and ORF.
  • Figure 3: The scaled variance of the Laplace kernel $f_L$ with $\alpha=1$ (in red) and the Gauss kernel $f_G$ with $\sigma=1/\sqrt{2}$ (in blue) for dimension $d=3$ (solid lines), $d=9$ (dashed lines) and $d=15$ (dotted lines). The variance is multiplied times $\|x\|$ (Laplace kernel) and $\|x\|^2$ (Gauss kernel). We observe that the scaled variance increases monotonically in all cases, seemingly bounded from above by a constant (black solid line).
  • Figure 4: Loglog plot of the approximation error in equation \ref{['eq:approximation']} versus the number $P$ of projections for the negative distance kernel. The results are averaged over $50$ realizations of ${\boldsymbol{\xi}}^P$ and $1000$ realizations of $x$. We fit a regression line in the loglog plot for each method to estimate the convergence rate, see also Table \ref{['tab:rates_energy']}.
  • Figure 5: Loglog plot of the approximation error in equation \ref{['eq:approximation']} versus the number $P$ of projections for different kernels and dimensions (left $d=3$, middle $d=10$, right $d=50$). The results are averaged over $50$ realizations of ${\boldsymbol{\xi}}^P$ and $1000$ realizations of $x$. The kernel parameters are set by the median rule with scale factor $\gamma=\frac{1}{2}$. We fit a regression line in the loglog plot for each method to estimate the convergence rate, see also Table \ref{['tab:rates_einhalb']}.
  • ...and 7 more figures

Theorems & Definitions (7)

  • Theorem 1
  • Definition 2
  • Theorem 3
  • Corollary 4
  • Proposition 5
  • proof
  • Remark 6