Table of Contents
Fetching ...

Fast Spectrum Estimation of Some Kernel Matrices

Mikhail Lepilov

TL;DR

This work introduces a new eigenvalue quantile estimation framework for some kernel matrices that gives meaningful bounds for all the eigenvalues of a kernel matrix while avoiding the cost of constructing the full matrix.

Abstract

In data science, individual observations are often assumed to come independently from an underlying probability space. Kernel matrices formed from large sets of such observations arise frequently, for example during classification tasks. It is desirable to know the eigenvalue decay properties of these matrices without explicitly forming them, such as when determining if a low-rank approximation is feasible. In this work, we introduce a new eigenvalue quantile estimation framework for some kernel matrices. This framework gives meaningful bounds for all the eigenvalues of a kernel matrix while avoiding the cost of constructing the full matrix. The kernel matrices under consideration come from a kernel with quick decay away from the diagonal applied to uniformly-distributed sets of points in Euclidean space of any dimension. We prove the efficacy of this framework given certain bounds on the kernel function, and we provide empirical evidence for its accuracy. In the process, we also prove a very general interlacing-type theorem for finite sets of numbers. Additionally, we indicate an application of this framework to the study of the intrinsic dimension of data, as well as several other directions in which to generalize this work.

Fast Spectrum Estimation of Some Kernel Matrices

TL;DR

This work introduces a new eigenvalue quantile estimation framework for some kernel matrices that gives meaningful bounds for all the eigenvalues of a kernel matrix while avoiding the cost of constructing the full matrix.

Abstract

In data science, individual observations are often assumed to come independently from an underlying probability space. Kernel matrices formed from large sets of such observations arise frequently, for example during classification tasks. It is desirable to know the eigenvalue decay properties of these matrices without explicitly forming them, such as when determining if a low-rank approximation is feasible. In this work, we introduce a new eigenvalue quantile estimation framework for some kernel matrices. This framework gives meaningful bounds for all the eigenvalues of a kernel matrix while avoiding the cost of constructing the full matrix. The kernel matrices under consideration come from a kernel with quick decay away from the diagonal applied to uniformly-distributed sets of points in Euclidean space of any dimension. We prove the efficacy of this framework given certain bounds on the kernel function, and we provide empirical evidence for its accuracy. In the process, we also prove a very general interlacing-type theorem for finite sets of numbers. Additionally, we indicate an application of this framework to the study of the intrinsic dimension of data, as well as several other directions in which to generalize this work.

Paper Structure

This paper contains 6 sections, 4 theorems, 28 equations, 8 figures.

Key Result

Proposition 2.1

Let $S,T\subseteq\mathbb{R}_{\geq0}$ with $|S|=n$, $|T|=k$, and $k|n$. Denote by $a_i$ and $b_j$ the $i$th and $j$th largest elements of $S$ and $T$, respectively, and suppose $\sum_{i=1}^n\frac{a_i^r}{n}=\sum_{i=1}^k\frac{b_i^r}{k}$ for all $r=1,\ldots,k$. Then for all $j=1,\ldots,n$, where we define $b_0=0$ and $b_{k+1}=\infty$. (See Figure fig:momentinterlace for an illustration of this.)

Figures (8)

  • Figure 1.1: The first 100 eigenvalues of the kernel matrix (blue) formed when $X$ consists of 512 points taken from the standard uniform distribution in one dimension, as well as those of its "naive" Nyström approximation (red) with 32 points. Here, the kernel used is $\kappa(x,y)=\mathrm{exp}(-10(x-y)^2)$ (top figure), $\kappa(x,y)=exp(-100(x-y)^2)$ (middle figure), and $\kappa(x,y)=exp(-10000(x-y)^2)$ (bottom figure). It is evident that, in the top figure, the eigenvalue decay of the subsampled matrix corresponds well with the eigenvalue decay of the full matrix, but in the center and especially bottom figures, this is no longer the case. This indicates that the Nyström method only works to give an estimate of numerical rank if we know a priori that it is low for our given kernel matrix, as in the top figure.
  • Figure 2.2: The sets $S=\{1,2,3,4,5,7,9,12,13,14,22,23,29,30,31\}$ (blue dots) and $T$ (solid red dashes), where $T$ is picked such that $\sum_{i=1}^{15}a_i^r/15=\sum_{i=1}^5b_i^r/5$ for $r=1,\ldots,5$. Hence, $T$ is approximately $\{1.51216,6.52312,9.54601,20.5897,30.1624\}$. Proposition \ref{['prop:momentmatching']} shows, for example, that $b_3\leq a_{10},a_{11},a_{12}\leq b_5$. This is illustrated with the blue arrows above.
  • Figure 2.3: The condition in \ref{['eq:quickdecay']}: the left figure is a heatmap of the Kronecker delta on the region $[0,1]\times[0,1]$, and the right figure is a heatmap of the Gaussian kernel $\kappa_1(x,y)=e^{-1000(x-y)^2}$ on the same region. Informally, we may think of the integral of the Kronecker delta over the blue subregion $[0,s]\times[0,s]$ (the length of the red diagonal) as $s$ times its integral over the entire region $[0,1]\times[0,1]$ (the length of the entire diagonal). Of course, both integrals are formally 0. Similarly, we can see that the integral of $\kappa_1$ over $[0,s]\times[0,s]$ is approximately $s$ times its integral over $[0,1]\times[0,1]$. This is contrasted with the case, for example, of the Gaussian kernel $\kappa_2(x,y)=e^{-(x-y)^2/10000}$, whose integral over $[0,s]\times[0,s]$ is approximately $s^2$ times its integral over $[0,1]\times[0,1]$. Thus, the condition \ref{['eq:quickdecay']} makes precise the way in which $\kappa_1$ does and $\kappa_2$ does not have fast decay away from the diagonal.
  • Figure 3.4: The averaged eigenvalues of $A$ (blue dots) together with the repeated, averaged eigenvalues of $B$ (red crosses and yellow circles), formed as in Example \ref{['ex:firstex']}. Two averages over $m=256000$ runs of finding $B$ are shown, illustrating the variation inherent to our framework.
  • Figure 3.5: The averaged eigenvalues of $A$ (blue dots) together with the repeated, averaged eigenvalues of $B$ (red crosses and yellow circles), formed as in Example \ref{['ex:secondex']}. Two averages over $m=128000$ runs of finding $B$ are shown, illustrating the variation inherent to our framework.
  • ...and 3 more figures

Theorems & Definitions (13)

  • Proposition 2.1
  • proof
  • Corollary 2.2
  • proof
  • Proposition 2.3
  • proof
  • Example 1
  • Proposition 2.4
  • proof
  • Example 2
  • ...and 3 more