Table of Contents
Fetching ...

Orthogonal Random Features: Explicit Forms and Sharp Inequalities

Nizar Demni, Hachem Kadri

TL;DR

This work analyzes the bias and the variance of the kernel approximation based on orthogonal random features which makes use of Haar Orthogonal matrices and derives sharp exponential bounds supporting the view that orthogonic random features are more informative than random Fourier features.

Abstract

Random features have been introduced to scale up kernel methods via randomization techniques. In particular, random Fourier features and orthogonal random features were used to approximate the popular Gaussian kernel. Random Fourier features are built in this case using a random Gaussian matrix. In this work, we analyze the bias and the variance of the kernel approximation based on orthogonal random features which makes use of Haar orthogonal matrices. We provide explicit expressions for these quantities using normalized Bessel functions, showing that orthogonal random features does not approximate the Gaussian kernel but a Bessel kernel. We also derive sharp exponential bounds supporting the view that orthogonal random features are less dispersed than random Fourier features.

Orthogonal Random Features: Explicit Forms and Sharp Inequalities

TL;DR

This work analyzes the bias and the variance of the kernel approximation based on orthogonal random features which makes use of Haar Orthogonal matrices and derives sharp exponential bounds supporting the view that orthogonic random features are more informative than random Fourier features.

Abstract

Random features have been introduced to scale up kernel methods via randomization techniques. In particular, random Fourier features and orthogonal random features were used to approximate the popular Gaussian kernel. Random Fourier features are built in this case using a random Gaussian matrix. In this work, we analyze the bias and the variance of the kernel approximation based on orthogonal random features which makes use of Haar orthogonal matrices. We provide explicit expressions for these quantities using normalized Bessel functions, showing that orthogonal random features does not approximate the Gaussian kernel but a Bessel kernel. We also derive sharp exponential bounds supporting the view that orthogonal random features are less dispersed than random Fourier features.
Paper Structure (11 sections, 5 theorems, 66 equations, 4 figures, 1 table)

This paper contains 11 sections, 5 theorems, 66 equations, 4 figures, 1 table.

Key Result

Theorem 1

Let $\tilde{k}_{RFF}(x,y)$ be the RFF-based approximate kernel computed with $p$ random vectors in $\mathbb{R}^d$. Then its expectation and its variance are given by and respectively.

Figures (4)

  • Figure 1: The absolute difference between theoretical and empirical values of the bias and the variance of ORF for different values of the number of random features $p$. Left:$|M_{emp} - \mathbb{E}[\tilde{k}_{ORF}(x,y)]|$. Right:$|V_{emp} - V[\tilde{k}_{ORF}(x,y)]|$. The bias and variance of $\tilde{k}_{ORF}(x,y)$, $\mathbb{E}[\tilde{k}_{ORF}(x,y)]$ and $V[\tilde{k}_{ORF}(x,y)]$, are computed using the explicit closed expressions provided in Theorems \ref{['th:mean_orf']} and \ref{['th:var_orf']}. $M_{emp}$ and $V_{emp}$ are the empirical bias and variance, respectively. Data points $x$ and $y$ are randomly generated from a normal distribution with zero mean and unit variance. We consider here the case where the value of $z:= \|x-y\|$ is not small (z is equal to 24 in this simulation).
  • Figure 2: The bias of $\tilde{k}_{ORF}(x,y)$ and bounds of Proposition \ref{['prop:mean_orf']} as a function of $z:=\|x-y\|$.
  • Figure 3: The variance of $\tilde{k}_{ORF}(x,y)$ and $\tilde{k}_{RFF}(x,y)$ as a function of $z:=\|x-y\|$.
  • Figure 4: Mean squared error (MSE) between the kernel matrix approximated by ORF or RFF and the full kernel matrix computed by the Bessel or the Gaussian kernel, for different values of the number of random features $p$.

Theorems & Definitions (12)

  • Theorem 1: Bias and variance of $\bf\tilde{k}_{RFF}(x,y)$
  • proof
  • Theorem 2: Bias of $\bf\tilde{k}_{ORF}(x,y)$
  • proof : Sktech of proof
  • Proposition 3
  • proof : Sketch of proof
  • Theorem 4: Variance of $\bf\tilde{k}_{ORF}(x,y)$
  • proof : Sketch of proof
  • Remark 5
  • Proposition 6
  • ...and 2 more