Estimation of Local Geometric Structure on Manifolds from Noisy Data
Yariv Aizenbud, Barak Sober
TL;DR
This work tackles the problem of estimating a point on an unknown smooth manifold $\mathcal{M}\subset\mathbb{R}^D$ from noisy samples drawn from a tubular neighborhood, together with the local tangent space at the projected point. The authors propose a two-step, locally geometric algorithm inspired by Manifold-MLS: first obtain a robust local coordinate system that approximates the tangent, then iteratively refine this coordinate system while performing local regression to project the query point onto $\mathcal{M}$ and estimate $T_{\mathbf{p}}\mathcal{M}$. They prove high-probability convergence of $\hat p_n$ to the true projection $\mathbf{p}$ and $\widehat{T_{\hat p_n}\mathcal{M}}$ to $T_{\mathbf{p}}\mathcal{M}$, with asymptotic rates $\mathrm{dist}(\hat p_n,\mathcal{M}) = \widetilde{\mathcal{O}}(n^{-k/(2k+d)})$, $\|\hat p_n-\mathbf{p}\| = \widetilde{\mathcal{O}}(n^{-(k-1)/(2k+d)})$, and $\angle_{\max}(\widehat{T_{\hat p_n}\mathcal{M}},T_{\mathbf{p}}\mathcal{M}) = \widetilde{\mathcal{O}}(n^{-(k-1)/(2k+d)})$ for large $n$, up to log factors. The method leverages a tilted local graph representation and carefully handles bias induced by the tubular noise via iterative tangent refinement, providing concrete guarantees for point estimation and tangent-space recovery with potential applications in denoising and geometric data processing on manifolds.
Abstract
A common observation in data-driven applications is that high-dimensional data have a low intrinsic dimension, at least locally. In this work, we consider the problem of point estimation for manifold-valued data. Namely, given a finite set of noisy samples of $\mathcal{M}$, a $d$ dimensional submanifold of $\mathbb{R}^D$, and a point $r$ near the manifold we aim to project $r$ onto the manifold. Assuming that the data was sampled uniformly from a tubular neighborhood of a $k$-times smooth boundaryless and compact manifold, we present an algorithm that takes $r$ from this neighborhood and outputs $\hat p_n\in \mathbb{R}^D$, and $\widehat{T_{\hat p_n}\mathcal{M}}$ an element in the Grassmannian $Gr(d, D)$. We prove that as the number of samples $n\to\infty$, the point $\hat p_n$ converges to $\mathbf{p}\in \mathcal{M}$, the projection of $r$ onto $\mathcal{M}$, and $\widehat{T_{\hat p_n}\mathcal{M}}$ converges to $T_{\mathbf{p}}\mathcal{M}$ (the tangent space at that point) with high probability. Furthermore, we show that $\hat p_n$ approaches the manifold with an asymptotic rate of $n^{-\frac{k}{2k + d}}$, and that $\hat p_n, \widehat{T_{\hat p_n}\mathcal{M}}$ approach $\mathbf{p}$ and $T_{\mathbf{p}}\mathcal{M}$ correspondingly with asymptotic rates of $n^{-\frac{k-1}{2k + d}}$.
