Table of Contents
Fetching ...

Sketching the Heat Kernel: Using Gaussian Processes to Embed Data

Anna C. Gilbert, Kevin O'Neill

TL;DR

A novel, non-deterministic method for embedding data in low-dimensional Euclidean space based on computing realizations of a Gaussian process depending on the geometry of the data, which demonstrates further advantage in its robustness to outliers.

Abstract

This paper introduces a novel, non-deterministic method for embedding data in low-dimensional Euclidean space based on computing realizations of a Gaussian process depending on the geometry of the data. This type of embedding first appeared in (Adler et al, 2018) as a theoretical model for a generic manifold in high dimensions. In particular, we take the covariance function of the Gaussian process to be the heat kernel, and computing the embedding amounts to sketching a matrix representing the heat kernel. The Karhunen-Loève expansion reveals that the straight-line distances in the embedding approximate the diffusion distance in a probabilistic sense, avoiding the need for sharp cutoffs and maintaining some of the smaller-scale structure. Our method demonstrates further advantage in its robustness to outliers. We justify the approach with both theory and experiments.

Sketching the Heat Kernel: Using Gaussian Processes to Embed Data

TL;DR

A novel, non-deterministic method for embedding data in low-dimensional Euclidean space based on computing realizations of a Gaussian process depending on the geometry of the data, which demonstrates further advantage in its robustness to outliers.

Abstract

This paper introduces a novel, non-deterministic method for embedding data in low-dimensional Euclidean space based on computing realizations of a Gaussian process depending on the geometry of the data. This type of embedding first appeared in (Adler et al, 2018) as a theoretical model for a generic manifold in high dimensions. In particular, we take the covariance function of the Gaussian process to be the heat kernel, and computing the embedding amounts to sketching a matrix representing the heat kernel. The Karhunen-Loève expansion reveals that the straight-line distances in the embedding approximate the diffusion distance in a probabilistic sense, avoiding the need for sharp cutoffs and maintaining some of the smaller-scale structure. Our method demonstrates further advantage in its robustness to outliers. We justify the approach with both theory and experiments.
Paper Structure (19 sections, 16 theorems, 95 equations, 6 figures, 5 algorithms)

This paper contains 19 sections, 16 theorems, 95 equations, 6 figures, 5 algorithms.

Key Result

Theorem 2.1

BerardEtAl Let $(M,g_M)$, $(\varphi_i)_{i=0}^\infty$, and $(\lambda_i)_{i=0}^\infty$ be as above. Then, is an embedding of $M$ into $\ell^2$ for all $t>0$. Furthermore, the pullback metric $\psi_t^* d_{\ell^2}$ is asymptotic to $g_M$ as $t\to0$.

Figures (6)

  • Figure 1: Embeddings of $S^1$
  • Figure 2: Comparison of methods for two manifolds
  • Figure 3: Embeddings with outliers
  • Figure 4: Three sample embeddings of $S^1$ plus two outliers with GPS
  • Figure 5: Comparison of Gaussian process embeddings with i.i.d. Gaussian matrices and with symmetric Bernoulli matrices
  • ...and 1 more figures

Theorems & Definitions (22)

  • Theorem 2.1
  • Theorem 2.2
  • Theorem 2.3: Karhunen-Loève Expansion
  • Proposition 2.4
  • proof
  • Theorem 2.5
  • Proposition 2.6
  • proof
  • Lemma 2.7
  • Lemma 2.8
  • ...and 12 more