Table of Contents
Fetching ...

Inferring manifolds using Gaussian processes

David B Dunson, Nan Wu

TL;DR

Inferring manifolds from high-dimensional noisy data is addressed by MrGap, a local regression framework that replaces global manifold reconstruction with Gaussian process regression guided by the local covariance structure. The method builds local charts from the leading eigenvectors of a local covariance matrix, denoise points via GP regression, and interpolate new points using local parameterizations of the manifold, yielding probabilistic manifold reconstructions. Theoretical analysis provides bias/variance bounds for the local covariance under Gaussian noise, chart construction guarantees, and convergence results for the interpolation process, while numerical experiments demonstrate accurate denoising and smooth interpolation on Cassini Oval, a torus, and real bird vocalization data. The approach relaxes restrictive distributional assumptions and offers a practical pipeline for denoising and interpolating data near unknown manifolds with potential ecological and spectrogram applications.

Abstract

It is often of interest to infer lower-dimensional structure underlying complex data. As a flexible class of non-linear structures, it is common to focus on Riemannian manifolds. Most existing manifold learning algorithms replace the original data with lower-dimensional coordinates without providing an estimate of the manifold or using the manifold to denoise the original data. This article proposes a new methodology to address these problems, allowing interpolation of the estimated manifold between the fitted data points. The proposed approach is motivated by the novel theoretical properties of local covariance matrices constructed from samples near a manifold. Our results enable us to turn a global manifold reconstruction problem into a local regression problem, allowing for the application of Gaussian processes for probabilistic manifold reconstruction. In addition to the theory justifying our methodology, we provide simulated and real data examples to illustrate the performance.

Inferring manifolds using Gaussian processes

TL;DR

Inferring manifolds from high-dimensional noisy data is addressed by MrGap, a local regression framework that replaces global manifold reconstruction with Gaussian process regression guided by the local covariance structure. The method builds local charts from the leading eigenvectors of a local covariance matrix, denoise points via GP regression, and interpolate new points using local parameterizations of the manifold, yielding probabilistic manifold reconstructions. Theoretical analysis provides bias/variance bounds for the local covariance under Gaussian noise, chart construction guarantees, and convergence results for the interpolation process, while numerical experiments demonstrate accurate denoising and smooth interpolation on Cassini Oval, a torus, and real bird vocalization data. The approach relaxes restrictive distributional assumptions and offers a practical pipeline for denoising and interpolating data near unknown manifolds with potential ecological and spectrogram applications.

Abstract

It is often of interest to infer lower-dimensional structure underlying complex data. As a flexible class of non-linear structures, it is common to focus on Riemannian manifolds. Most existing manifold learning algorithms replace the original data with lower-dimensional coordinates without providing an estimate of the manifold or using the manifold to denoise the original data. This article proposes a new methodology to address these problems, allowing interpolation of the estimated manifold between the fitted data points. The proposed approach is motivated by the novel theoretical properties of local covariance matrices constructed from samples near a manifold. Our results enable us to turn a global manifold reconstruction problem into a local regression problem, allowing for the application of Gaussian processes for probabilistic manifold reconstruction. In addition to the theory justifying our methodology, we provide simulated and real data examples to illustrate the performance.

Paper Structure

This paper contains 54 sections, 20 theorems, 142 equations, 20 figures, 1 table, 2 algorithms.

Key Result

Theorem 3.1

Under Assumptions manifod with noise-assumption trans and rot, suppose $\epsilon$ is small enough depending on $d$, $D$, scalar curvature of $M$ and second fundamental form of $\iota(M)$. For $\beta>1$, if $\sigma \leq \min \{\frac{1}{\sqrt{-4(d+5)\log \epsilon}}, \frac{1}{\sqrt{12\log(2n)}} \} \eps where The top left block of $\mathcal{E}$ is a $d$ by $d$ matrix. The constant factors in $\mathca

Figures (20)

  • Figure 1: Top row: $5$ out of $83$ spectrograms corresponding to a call type of Anthus trivialis. The rows of the $\mathbb{R}^{75 \times 197}$ matrices correspond to frequency (kHz), and the columns correspond to time (ms) and entries measure amplitude of the signal. Samples 2 and 4 are near Sample 1. Second and third rows: $10$ generated samples near Sample 1 using MrGap. Region 2 in the generated sample is similar to Sample 1, while region 1 is similar to Samples 2 and 4.
  • Figure 2: Left Panel: The solid lines are the standard coordinates of $\mathbb{R}^D$. The solid curve is the plot of $[u, F(u)]^\top$. The dashed lines show the rotation of the coordinates under $U$ and the plot of $U [u, F(u)]^\top$. Right Panel: The solid curve is $\iota(M)$. After a translation by $y$, the graph of $y+U [u, F(u)]^\top$ gives a parametrization of $B^{\mathbb{R}^D}_{\delta}(y) \cap \iota(M)$.
  • Figure 3: Left Panel: The solid curve is $\iota(M)$. The solid lines are affine subspace $\mathcal{H}_k$ and affine normal subspace of $\mathcal{H}_k$. The blue points are noisy data points in the $\delta$ neighborhood of $y_k$. Right Panel: The solid lines are coordinates in $\mathbb{R}^D$ and the graph of function $F_k$. After (a) a translation under $-y_k$ and (b) a rotation under $U_{n,\epsilon}(y_k)^\top$, blue points in the left panel (the noisy data points) are mapped to blue points in the right panel. In particular, $y_k$ is mapped to the origin. If we project the blue points except $0 \in \mathbb{R}^D$ in the right panel onto $\mathbb{R}^d$ and $\mathbb{R}^{D-d}$ which are the last operations (c) in $\mathcal{P}_{y_k}$ and $\mathcal{P}^\bot_{y_k}$ , then we have inputs $w_{k,j}$ and response variables $z_{k,j}$ for $F_k$ respectively. The green point in the right panel is $[0, F_k(0)]^\top$. We apply \ref{['denoised equation 00']} to get the denoised output $\hat{y}_k$ indicated by the green point in the left panel.
  • Figure 4: We focus our discussion on the point $\textbf{y}_2$. Left Panel: The blues points are $B^{\mathbb{R}^D}_{\delta}(\textbf{y}_2)\cap\{y_i\}_{i=1}^n$. Suppose we interpolated $K$ points around $\textbf{y}_1$ on $\iota(M)$ whose union is $\tilde{M}_{1}$. We find $\tilde{M}_{1} \cap B^{\mathbb{R}^D}_{\delta}(\textbf{y}_2)$ (orange points). Right Panel: The solid curve is the graph of the function $F_2$. After a translation under $-\textbf{y}_2$ and a rotation under $U_{n,\epsilon}(\textbf{y}_2)^\top$, the blue points and the orange points in the left panel are mapped to the blue points and orange points in the right panel. The purple points $\{\tilde{\textbf{u}}_{2,j}\}_{j=1}^K$ are generated in the domain $O_2$ of $F_2$ in $\mathbb{R}^d$ for the prediction. The blue points except $0$ and the orange points are projected onto $\mathbb{R}^d$ and $\mathbb{R}^{D-d}$ to be the predictors and the response variables for the predictions whose coordinates are $\{[\tilde{\textbf{u}}_{2, j}, F_2(\tilde{\textbf{u}}_{2, j})]^\top\}_{j=1}^K$ (green points in the right panel). The green points in the right panel are mapped back to green points in the left panel through \ref{['denoised equation 11']} which are the interpolations around $\textbf{y}_2$.
  • Figure 5: Both $O_k$ and $O^{(\texttt{I}-1)}_k$ are contained in $B^{\mathbb{R}^d}_{3\delta}(0)$. Left: An illustration of $O_k \subset \mathbb{R}^d$ where $O_k$ contains a small round ball. Right: An illustration of $O^{(\texttt{I}-1)}_k \subset \mathbb{R}^d$ where $O^{(\texttt{I}-1)}_k$ contains a larger round ball of radius close to $3\delta$.
  • ...and 15 more figures

Theorems & Definitions (36)

  • Theorem 3.1
  • Theorem 3.2
  • Theorem 3.3
  • Theorem 3.4
  • Definition B.1
  • Definition B.2
  • Definition B.3
  • Proposition B.1: Proposition 1 boissonnat2019reach
  • Proposition B.2
  • proof
  • ...and 26 more