All Random Features Representations are Equivalent

Luke Sernau; Silvano Bonacina; Rif A. Saurous

All Random Features Representations are Equivalent

Luke Sernau, Silvano Bonacina, Rif A. Saurous

TL;DR

This work derives an optimal sampling policy, under which all random features representations have the same approximation error, which is shown to be the lowest possible.

Abstract

Random features are a powerful technique for rewriting positive-definite kernels as linear products. They bring linear tools to bear in important nonlinear domains like KNNs and attention. Unfortunately, practical implementations require approximating an expectation, usually via sampling. This has led to the development of increasingly elaborate representations with ever lower sample error. We resolve this arms race by deriving an optimal sampling policy. Under this policy all random features representations have the same approximation error, which we show is the lowest possible. This means that we are free to choose whatever representation we please, provided we sample optimally.

All Random Features Representations are Equivalent

TL;DR

This work derives an optimal sampling policy, under which all random features representations have the same approximation error, which is shown to be the lowest possible.

Abstract

Paper Structure (4 sections, 1 theorem, 18 equations)

This paper contains 4 sections, 1 theorem, 18 equations.

Introduction
Importance sampling
Optimal sampling
The choice of feature representation does not matter

Key Result

Theorem 1

Let $K$ be a positive-definite kernel such that for some $\phi$ and $\Omega$, and every $\Psi$ with the same support as $\Omega$. Suppose $x_1$ and $x_2$ are sampled from $\mathcal{X}_1$ and $\mathcal{X}_2$, respectively. Then the expected sample variance $\mathcal{V}_\Psi$ over all $x_1$ and $x_2$ will be minimized when where The resulting optimal variance will be

Theorems & Definitions (4)

Definition 1
Definition 2
Theorem 1
proof

All Random Features Representations are Equivalent

TL;DR

Abstract

All Random Features Representations are Equivalent

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (4)