Table of Contents
Fetching ...

Estimation of Local Geometric Structure on Manifolds from Noisy Data

Yariv Aizenbud, Barak Sober

TL;DR

This work tackles the problem of estimating a point on an unknown smooth manifold $\mathcal{M}\subset\mathbb{R}^D$ from noisy samples drawn from a tubular neighborhood, together with the local tangent space at the projected point. The authors propose a two-step, locally geometric algorithm inspired by Manifold-MLS: first obtain a robust local coordinate system that approximates the tangent, then iteratively refine this coordinate system while performing local regression to project the query point onto $\mathcal{M}$ and estimate $T_{\mathbf{p}}\mathcal{M}$. They prove high-probability convergence of $\hat p_n$ to the true projection $\mathbf{p}$ and $\widehat{T_{\hat p_n}\mathcal{M}}$ to $T_{\mathbf{p}}\mathcal{M}$, with asymptotic rates $\mathrm{dist}(\hat p_n,\mathcal{M}) = \widetilde{\mathcal{O}}(n^{-k/(2k+d)})$, $\|\hat p_n-\mathbf{p}\| = \widetilde{\mathcal{O}}(n^{-(k-1)/(2k+d)})$, and $\angle_{\max}(\widehat{T_{\hat p_n}\mathcal{M}},T_{\mathbf{p}}\mathcal{M}) = \widetilde{\mathcal{O}}(n^{-(k-1)/(2k+d)})$ for large $n$, up to log factors. The method leverages a tilted local graph representation and carefully handles bias induced by the tubular noise via iterative tangent refinement, providing concrete guarantees for point estimation and tangent-space recovery with potential applications in denoising and geometric data processing on manifolds.

Abstract

A common observation in data-driven applications is that high-dimensional data have a low intrinsic dimension, at least locally. In this work, we consider the problem of point estimation for manifold-valued data. Namely, given a finite set of noisy samples of $\mathcal{M}$, a $d$ dimensional submanifold of $\mathbb{R}^D$, and a point $r$ near the manifold we aim to project $r$ onto the manifold. Assuming that the data was sampled uniformly from a tubular neighborhood of a $k$-times smooth boundaryless and compact manifold, we present an algorithm that takes $r$ from this neighborhood and outputs $\hat p_n\in \mathbb{R}^D$, and $\widehat{T_{\hat p_n}\mathcal{M}}$ an element in the Grassmannian $Gr(d, D)$. We prove that as the number of samples $n\to\infty$, the point $\hat p_n$ converges to $\mathbf{p}\in \mathcal{M}$, the projection of $r$ onto $\mathcal{M}$, and $\widehat{T_{\hat p_n}\mathcal{M}}$ converges to $T_{\mathbf{p}}\mathcal{M}$ (the tangent space at that point) with high probability. Furthermore, we show that $\hat p_n$ approaches the manifold with an asymptotic rate of $n^{-\frac{k}{2k + d}}$, and that $\hat p_n, \widehat{T_{\hat p_n}\mathcal{M}}$ approach $\mathbf{p}$ and $T_{\mathbf{p}}\mathcal{M}$ correspondingly with asymptotic rates of $n^{-\frac{k-1}{2k + d}}$.

Estimation of Local Geometric Structure on Manifolds from Noisy Data

TL;DR

This work tackles the problem of estimating a point on an unknown smooth manifold from noisy samples drawn from a tubular neighborhood, together with the local tangent space at the projected point. The authors propose a two-step, locally geometric algorithm inspired by Manifold-MLS: first obtain a robust local coordinate system that approximates the tangent, then iteratively refine this coordinate system while performing local regression to project the query point onto and estimate . They prove high-probability convergence of to the true projection and to , with asymptotic rates , , and for large , up to log factors. The method leverages a tilted local graph representation and carefully handles bias induced by the tubular noise via iterative tangent refinement, providing concrete guarantees for point estimation and tangent-space recovery with potential applications in denoising and geometric data processing on manifolds.

Abstract

A common observation in data-driven applications is that high-dimensional data have a low intrinsic dimension, at least locally. In this work, we consider the problem of point estimation for manifold-valued data. Namely, given a finite set of noisy samples of , a dimensional submanifold of , and a point near the manifold we aim to project onto the manifold. Assuming that the data was sampled uniformly from a tubular neighborhood of a -times smooth boundaryless and compact manifold, we present an algorithm that takes from this neighborhood and outputs , and an element in the Grassmannian . We prove that as the number of samples , the point converges to , the projection of onto , and converges to (the tangent space at that point) with high probability. Furthermore, we show that approaches the manifold with an asymptotic rate of , and that approach and correspondingly with asymptotic rates of .

Paper Structure

This paper contains 32 sections, 53 theorems, 469 equations, 15 figures, 4 algorithms.

Key Result

Theorem 3.1

Assume $M > C_\tau \sqrt{D\log D}$ for some constant $C_\tau$ independent of $\tau$, and let $r\in \mathcal{M}_\sigma$. Then, for any $\delta>0$ arbitrarily small, there exists $N$ such that for any number of samples $n > N$, applying Algorithm alg:step2_clean with inputs $q_{-1}, H_0$ being the out and with probability of at least $1 - \delta$, where $r_0 = \frac{k}{2k +d}$, $r_1 = \frac{k-1}{2k

Figures (15)

  • Figure 1: Illustration of a manifold $\mathcal{M}$ (marked by the blue line) along with its tubular neighborhood $\mathcal{M}_\sigma$. Assuming uniform sampling in $\mathcal{M}_\sigma$ we mark the point $p\in\mathcal{M}$ by myGreen and the expected value with respect to the given coordinate system by myPurple. (a) The coordinate system is aligned with the tangent. (b) The coordinate system is tilted with respect to the tangent. As can be seen, in (a) the two points coincide, whereas in (b) the expected value differ from the point we wish to estimate.
  • Figure 2: Road-map for proof of Theorem \ref{['thm:Step2']}
  • Figure 3: Illustration of $\mathcal{M}$ as a graph of a function $f_\ell$(marked by the red solid line) above the coordinate system $f_\ell$. The boundary of $\mathcal{M}_\sigma$ is delineated by the pink lines and $\widetilde{f}_\ell$ is the conditioned expectancy $\mathbb{E}[\eta_\ell(y ~|~ x)]$ of this domain with respect to the presented $y$-axis.
  • Figure 4: Illustration of $g(0,\vec{y})$ and $\Omega(0)$ in the two dimensional case. Let $H$ be some local coordinate system and consider $\mathcal{M}$ as a local graph of some function $f:H\to H^\perp$. The upper bound for the values of the sample distribution above $0$ in some direction $\theta$ of $H^\perp$ is $g(0,\theta)$. This value is $\sigma$-away from some point on $\mathcal{M}$. We denote this point by $f(\widetilde{x}(0))\in \mathcal{M}$.
  • Figure 5: Geodesic "walk" on a circle. The red point is the initial point. The yellow points are the data set. The red point is then projected onto the estimation of the blue circle. Then at each step, a new point is generated along the circle (the black arrows connect the points. The plot illustrates 30 steps.
  • ...and 10 more figures

Theorems & Definitions (104)

  • Definition 1: Reach federer1959curvature
  • Theorem 3.1
  • Theorem 3.2
  • Theorem 3.3
  • proof : proof of Theorem \ref{['thm:Step1']}
  • Lemma 4.1
  • Lemma 4.2
  • Lemma 4.3
  • proof
  • Lemma 4.4
  • ...and 94 more