Table of Contents
Fetching ...

Fast Kernel Summation in High Dimensions via Slicing and Fourier Transforms

Johannes Hertrich

TL;DR

This paper proves that any radial kernel with analytic basis function can be represented as sliced version of some one-dimensional kernel and derive an analytic formula for the one-dimensional counterpart and provides a run time comparison and error estimate of the fast kernel summations.

Abstract

Kernel-based methods are heavily used in machine learning. However, they suffer from $O(N^2)$ complexity in the number $N$ of considered data points. In this paper, we propose an approximation procedure, which reduces this complexity to $O(N)$. Our approach is based on two ideas. First, we prove that any radial kernel with analytic basis function can be represented as sliced version of some one-dimensional kernel and derive an analytic formula for the one-dimensional counterpart. It turns out that the relation between one- and $d$-dimensional kernels is given by a generalized Riemann-Liouville fractional integral. Hence, we can reduce the $d$-dimensional kernel summation to a one-dimensional setting. Second, for solving these one-dimensional problems efficiently, we apply fast Fourier summations on non-equispaced data, a sorting algorithm or a combination of both. Due to its practical importance we pay special attention to the Gaussian kernel, where we show a dimension-independent error bound and represent its one-dimensional counterpart via a closed-form Fourier transform. We provide a run time comparison and error estimate of our fast kernel summations.

Fast Kernel Summation in High Dimensions via Slicing and Fourier Transforms

TL;DR

This paper proves that any radial kernel with analytic basis function can be represented as sliced version of some one-dimensional kernel and derive an analytic formula for the one-dimensional counterpart and provides a run time comparison and error estimate of the fast kernel summations.

Abstract

Kernel-based methods are heavily used in machine learning. However, they suffer from complexity in the number of considered data points. In this paper, we propose an approximation procedure, which reduces this complexity to . Our approach is based on two ideas. First, we prove that any radial kernel with analytic basis function can be represented as sliced version of some one-dimensional kernel and derive an analytic formula for the one-dimensional counterpart. It turns out that the relation between one- and -dimensional kernels is given by a generalized Riemann-Liouville fractional integral. Hence, we can reduce the -dimensional kernel summation to a one-dimensional setting. Second, for solving these one-dimensional problems efficiently, we apply fast Fourier summations on non-equispaced data, a sorting algorithm or a combination of both. Due to its practical importance we pay special attention to the Gaussian kernel, where we show a dimension-independent error bound and represent its one-dimensional counterpart via a closed-form Fourier transform. We provide a run time comparison and error estimate of our fast kernel summations.
Paper Structure (30 sections, 9 theorems, 82 equations, 5 figures, 3 tables, 2 algorithms)

This paper contains 30 sections, 9 theorems, 82 equations, 5 figures, 3 tables, 2 algorithms.

Key Result

Lemma 2.1

Let ${\mathrm k}\colon\mathbb{R}\times\mathbb{R}\to\mathbb{R}$ be a positive definite kernel. Then, $K\colon\mathbb{R}^d\times\mathbb{R}^d\to\mathbb{R}$ defined by eq:sliced_kernel is positive definite.

Figures (5)

  • Figure 1: We plot the basis functions $F$ of the kernels $K(x,y)=F(\|x-y\|)$ from Table \ref{['tab:kernels']} and the basis functions $f$ from the corresponding one-dimensional kernels $\mathrm{k}(x,y)=f(|x-y|)$.
  • Figure 2: Plot of the Fourier transform $\hat{f}_1$ of $f_1(x)={_1}F_1(\tfrac{d}{2},\tfrac{1}{2},\tfrac{-x^2}{2})$ for different dimensions $d$.
  • Figure 3: Dependence of run time and the per-summand error of the fast kernel summation on the number $N$ of samples, the number $P$ of projections and the dimension $d$ for different kinds of kernels. From top to bottom: Gaussian kernel, Matérn kernel, negative distance kernel, Laplacian kernel.
  • Figure 4: Comparison of the error versus run time for Slicing and RFF for different dimensions and kernel parameters. Top three rows: Gaussian kernel, bottom three rows: Laplacian kernel.
  • Figure : Basis functions $F$ for different kernels $K(x,y)=F(\|x-y\|)$ and corresponding basis functions $f$ from $\mathrm{k}(x,y)=f(|x-y|)$.

Theorems & Definitions (22)

  • Lemma 2.1
  • proof
  • Proposition 2.2
  • Theorem 2.3
  • Example 2.4
  • Corollary 2.5
  • proof
  • Lemma 2.6
  • proof
  • Proposition 2.7
  • ...and 12 more