Table of Contents
Fetching ...

Relative Error Embeddings for the Gaussian Kernel Distance

Di Chen, Jeff M. Phillips

Abstract

A reproducing kernel can define an embedding of a data point into an infinite dimensional reproducing kernel Hilbert space (RKHS). The norm in this space describes a distance, which we call the kernel distance. The random Fourier features (of Rahimi and Recht) describe an oblivious approximate mapping into finite dimensional Euclidean space that behaves similar to the RKHS. We show in this paper that for the Gaussian kernel the Euclidean norm between these mapped to features has $(1+\varepsilon)$-relative error with respect to the kernel distance. When there are $n$ data points, we show that $O((1/\varepsilon^2) \log(n))$ dimensions of the approximate feature space are sufficient and necessary. Without a bound on $n$, but when the original points lie in $\mathbb{R}^d$ and have diameter bounded by $\mathcal{M}$, then we show that $O((d/\varepsilon^2) \log(\mathcal{M}))$ dimensions are sufficient, and that this many are required, up to $\log(1/\varepsilon)$ factors.

Relative Error Embeddings for the Gaussian Kernel Distance

Abstract

A reproducing kernel can define an embedding of a data point into an infinite dimensional reproducing kernel Hilbert space (RKHS). The norm in this space describes a distance, which we call the kernel distance. The random Fourier features (of Rahimi and Recht) describe an oblivious approximate mapping into finite dimensional Euclidean space that behaves similar to the RKHS. We show in this paper that for the Gaussian kernel the Euclidean norm between these mapped to features has -relative error with respect to the kernel distance. When there are data points, we show that dimensions of the approximate feature space are sufficient and necessary. Without a bound on , but when the original points lie in and have diameter bounded by , then we show that dimensions are sufficient, and that this many are required, up to factors.

Paper Structure

This paper contains 26 sections, 15 theorems, 28 equations, 2 figures.

Key Result

lemma 1

For each $\Delta \in \mathbb{R}^d$ such that $\|\Delta\| \geq 1$ and $m = O((1/\varepsilon^2) \log(1/\delta))$ with $\varepsilon \in (0, 1/10)$ and $\delta \in (0,1)$. Then with probability at least $1-\delta$, we have $\frac{D_K(\Delta)}{D_{\hat{K}}(\Delta)} \in [1-\varepsilon,1+\varepsilon].$

Figures (2)

  • Figure 1: Relative error $|\frac{\hat{R}k}{R_k}-1|$ in % , against $t$, with $n=2000$, $k=40$ and different bandwidths. Relative error is roughly stable across different values of $\sigma$, and consistently reduced by increasing $t$.
  • Figure 2: (left) Inverse squared relative errors. (right) Relative errors with varying distance.

Theorems & Definitions (28)

  • lemma 1
  • proof
  • lemma 2
  • proof
  • lemma 3
  • theorem 1
  • proof
  • lemma 4
  • proof
  • theorem 2
  • ...and 18 more