Table of Contents
Fetching ...

Manifold Fitting under Unbounded Noise

Zhigang Yao, Yuqing Xia

TL;DR

This work tackles fitting a latent $d$-dimensional manifold ${\\cal M} \\subset \\mathbb{R}^D$ from data corrupted by unbounded Gaussian noise $\\xi_i \\sim G_{\\sigma}$ where $x_i = y_i + \\xi_i$ and $y_i \\sim U({\\cal M})$. It introduces ${\\cal M}_{out}$ via an implicit bias function $f$ built from a weighted tangent-space estimator $\\Psi_x^{\\alpha}$ that aggregates local projections $P_{x_i}$ at projected points, rather than the noisy samples themselves. Theoretical contributions show that, with high probability, ${\\cal M}_{out}$ is a $d$-dimensional, smooth manifold with a Hausdorff distance to ${\\cal M}$ of order $O(r^2)$ when $r = O(\\sqrt{\\sigma})$, and that the bias and derivatives of $f$ are tightly controlled (e.g., $\\|f(x)\\|_2 \\le C r^2$ on ${\\cal M}$, and $\\|\\partial_v f(x) - {\\Psi_x^{\\alpha}} v\\|_2 \\le C r$). Empirical validation on synthetic manifolds (circle, sphere, torus) and facial image denoising demonstrates improved accuracy over prior methods under unbounded Gaussian noise, highlighting the practical resilience and applicability of the approach. The paper thus provides a principled framework for manifold fitting under realistic, unbounded-noise conditions with rigorous convergence and smoothness guarantees.

Abstract

There has been an emerging trend in non-Euclidean statistical analysis of aiming to recover a low dimensional structure, namely a manifold, underlying the high dimensional data. Recovering the manifold requires the noise to be of certain concentration. Existing methods address this problem by constructing an approximated manifold based on the tangent space estimation at each sample point. Although theoretical convergence for these methods is guaranteed, either the samples are noiseless or the noise is bounded. However, if the noise is unbounded, which is a common scenario, the tangent space estimation at the noisy samples will be blurred. Fitting a manifold from the blurred tangent space might increase the inaccuracy. In this paper, we introduce a new manifold-fitting method, by which the output manifold is constructed by directly estimating the tangent spaces at the projected points on the underlying manifold, rather than at the sample points, to decrease the error caused by the noise. Assuming the noise is unbounded, our new method provides theoretical convergence in high probability, in terms of the upper bound of the distance between the estimated and underlying manifold. The smoothness of the estimated manifold is also evaluated by bounding the supremum of twice difference above. Numerical simulations are provided to validate our theoretical findings and demonstrate the advantages of our method over other relevant manifold fitting methods. Finally, our method is applied to real data examples.

Manifold Fitting under Unbounded Noise

TL;DR

This work tackles fitting a latent -dimensional manifold from data corrupted by unbounded Gaussian noise where and . It introduces via an implicit bias function built from a weighted tangent-space estimator that aggregates local projections at projected points, rather than the noisy samples themselves. Theoretical contributions show that, with high probability, is a -dimensional, smooth manifold with a Hausdorff distance to of order when , and that the bias and derivatives of are tightly controlled (e.g., on , and ). Empirical validation on synthetic manifolds (circle, sphere, torus) and facial image denoising demonstrates improved accuracy over prior methods under unbounded Gaussian noise, highlighting the practical resilience and applicability of the approach. The paper thus provides a principled framework for manifold fitting under realistic, unbounded-noise conditions with rigorous convergence and smoothness guarantees.

Abstract

There has been an emerging trend in non-Euclidean statistical analysis of aiming to recover a low dimensional structure, namely a manifold, underlying the high dimensional data. Recovering the manifold requires the noise to be of certain concentration. Existing methods address this problem by constructing an approximated manifold based on the tangent space estimation at each sample point. Although theoretical convergence for these methods is guaranteed, either the samples are noiseless or the noise is bounded. However, if the noise is unbounded, which is a common scenario, the tangent space estimation at the noisy samples will be blurred. Fitting a manifold from the blurred tangent space might increase the inaccuracy. In this paper, we introduce a new manifold-fitting method, by which the output manifold is constructed by directly estimating the tangent spaces at the projected points on the underlying manifold, rather than at the sample points, to decrease the error caused by the noise. Assuming the noise is unbounded, our new method provides theoretical convergence in high probability, in terms of the upper bound of the distance between the estimated and underlying manifold. The smoothness of the estimated manifold is also evaluated by bounding the supremum of twice difference above. Numerical simulations are provided to validate our theoretical findings and demonstrate the advantages of our method over other relevant manifold fitting methods. Finally, our method is applied to real data examples.

Paper Structure

This paper contains 30 sections, 30 theorems, 171 equations, 11 figures, 1 algorithm.

Key Result

Proposition 2

Figures (11)

  • Figure 1: A toy example to illustrate the methods bymohammed2017manifold (left panel) and pmlr-v75-fefferman18a (right panel), where the black curve is a local part of ${\cal M}$, $x$ is a point off ${\cal M}$, and the dots $x_i$ and $x_j$ represent two samples in the neighborhood of $x$. Unlike those in the right panel, the samples in the left panel are on ${\cal M}$, as mohammed2017manifold focus on the noiseless case.
  • Figure 2: A toy example to illustrate the methods in our method. ${\textcolor{black}{\Psi_x^\alpha}}$ is used to estimate the orthogonal projection onto the normal space of ${\cal M}$ at $x^*$, the black dot $\bf{b}$ is used to estimate a point in $T_{x^*}{\cal M}$. Then the space $\{x^{\prime}: {\textcolor{black}{\Psi_x^\alpha}}(x^{\prime}-\bf{b})\}$, illustrated as the black dashed line, approximates $T_{x^*}{\cal M}$, and the bias from $x$ to the black dashed line is the estimated bias from $x$ to ${\cal M}$, geometrically illustrated as the black arrow.
  • Figure 3: The dependency of the core theorems, lemmas, and propositions.
  • Figure 4: Diagram of variables used for the discussion of $P_z$.
  • Figure 5: The performance of our method, km17, cf18, ya21(deg=1) and ya21(deg=2) when fitting a circle (top row) and a sphere (bottom row), where black points represent points in $\tilde{P}$(black dots) and red points represents their projections onto ${\cal M}$.
  • ...and 6 more figures

Theorems & Definitions (31)

  • Definition 1: Reach
  • Proposition 2
  • Proposition 3
  • Theorem 4
  • Theorem 5
  • Corollary 6
  • Theorem 7
  • Lemma 8
  • Lemma 9
  • Lemma 10
  • ...and 21 more