Table of Contents
Fetching ...

Robust Tangent Space Estimation via Laplacian Eigenvector Gradient Orthogonalization

Dhruv Kohli, Sawyer J. Robertson, Gal Mishne, Alexander Cloninger

TL;DR

This work addresses the fragile nature of local tangent-space estimation under noise by introducing LEGO, a spectral method that leverages the gradients of low-frequency global graph Laplacian eigenvectors to robustly estimate tangent spaces. The authors provide two theoretical foundations: a differential-geometric analysis on tubular neighborhoods showing low-frequency eigenfunctions align with the tangent bundle, and a random-matrix analysis establishing noise-robust convergence of the Laplacian and its eigenvectors. Empirically, LEGO consistently outperforms LPCA in noisy settings and delivers tangible gains across manifold learning, boundary detection, and local intrinsic-dimension estimation, including accurate torus-structured embeddings via tear-based alignment. Together, these results demonstrate that exploiting global geometric information via Laplacian eigenvectors yields more reliable local geometry, with broad practical implications for downstream data analysis tasks.

Abstract

Estimating the tangent spaces of a data manifold is a fundamental problem in data analysis. The standard approach, Local Principal Component Analysis (LPCA), struggles in high-noise settings due to a critical trade-off in choosing the neighborhood size. Selecting an optimal size requires prior knowledge of the geometric and noise characteristics of the data that are often unavailable. In this paper, we propose a spectral method, Laplacian Eigenvector Gradient Orthogonalization (LEGO), that utilizes the global structure of the data to guide local tangent space estimation. Instead of relying solely on local neighborhoods, LEGO estimates the tangent space at each data point by orthogonalizing the gradients of low-frequency eigenvectors of the graph Laplacian. We provide two theoretical justifications of our method. First, a differential geometric analysis on a tubular neighborhood of a manifold shows that gradients of the low-frequency Laplacian eigenfunctions of the tube align closely with the manifold's tangent bundle, while an eigenfunction with high gradient in directions orthogonal to the manifold lie deeper in the spectrum. Second, a random matrix theoretic analysis also demonstrates that low-frequency eigenvectors are robust to sub-Gaussian noise. Through comprehensive experiments, we demonstrate that LEGO yields tangent space estimates that are significantly more robust to noise than those from LPCA, resulting in marked improvements in downstream tasks such as manifold learning, boundary detection, and local intrinsic dimension estimation.

Robust Tangent Space Estimation via Laplacian Eigenvector Gradient Orthogonalization

TL;DR

This work addresses the fragile nature of local tangent-space estimation under noise by introducing LEGO, a spectral method that leverages the gradients of low-frequency global graph Laplacian eigenvectors to robustly estimate tangent spaces. The authors provide two theoretical foundations: a differential-geometric analysis on tubular neighborhoods showing low-frequency eigenfunctions align with the tangent bundle, and a random-matrix analysis establishing noise-robust convergence of the Laplacian and its eigenvectors. Empirically, LEGO consistently outperforms LPCA in noisy settings and delivers tangible gains across manifold learning, boundary detection, and local intrinsic-dimension estimation, including accurate torus-structured embeddings via tear-based alignment. Together, these results demonstrate that exploiting global geometric information via Laplacian eigenvectors yields more reliable local geometry, with broad practical implications for downstream data analysis tasks.

Abstract

Estimating the tangent spaces of a data manifold is a fundamental problem in data analysis. The standard approach, Local Principal Component Analysis (LPCA), struggles in high-noise settings due to a critical trade-off in choosing the neighborhood size. Selecting an optimal size requires prior knowledge of the geometric and noise characteristics of the data that are often unavailable. In this paper, we propose a spectral method, Laplacian Eigenvector Gradient Orthogonalization (LEGO), that utilizes the global structure of the data to guide local tangent space estimation. Instead of relying solely on local neighborhoods, LEGO estimates the tangent space at each data point by orthogonalizing the gradients of low-frequency eigenvectors of the graph Laplacian. We provide two theoretical justifications of our method. First, a differential geometric analysis on a tubular neighborhood of a manifold shows that gradients of the low-frequency Laplacian eigenfunctions of the tube align closely with the manifold's tangent bundle, while an eigenfunction with high gradient in directions orthogonal to the manifold lie deeper in the spectrum. Second, a random matrix theoretic analysis also demonstrates that low-frequency eigenvectors are robust to sub-Gaussian noise. Through comprehensive experiments, we demonstrate that LEGO yields tangent space estimates that are significantly more robust to noise than those from LPCA, resulting in marked improvements in downstream tasks such as manifold learning, boundary detection, and local intrinsic dimension estimation.

Paper Structure

This paper contains 17 sections, 13 theorems, 104 equations, 7 figures, 1 algorithm.

Key Result

Lemma 1

The pullback metric ${g_{}^{{\varepsilon}{}}} = {\mathcal{D}_{{\varepsilon}{}}^*}{g_{}^{}} = {\mathcal{D}_{{\varepsilon}{}}^*}{\Psi^{*}}{\delta_{{{d}{}+{k}{}}{}}}$ with respect to the coordinate vector fields $\{\partial^H_{1}|_{({x}{},{n}{})},\ldots,\partial^H_{{d}{}}|_{({x}{},{n}{})},\partial_{{d} Consequently, the Riemannian gradient of ${\widehat{\phi}}{} \in C_0^{\infty}({N{{B}{}}^{{r}{}}})$

Figures (7)

  • Figure 1: Illustration of tangent space estimation using LPCA and LEGO on a noisy point cloud generated by non-uniform sampling of a closed curve---wave on a circle---with heteroskedastic noise added in the normal direction. (a) Clean data points with ground truth tangent vectors, along with tangent vectors estimated from the noisy data using LPCA ($k_{\mathrm{nn}}{} = 14$ and ${d}{} = 1$) and LEGO ($k_{\mathrm{nn}}{} = 14$, $m_0 = 20$, $m = 100$ and ${d}{} = 1$). (b) Cosine dissimilarity between the true and the estimated tangent vectors. (c) Eigenvectors of the graph Laplacian constructed from noisy data diffusionmaps, highlighting that those exhibiting high gradient in the noise direction lie deeper into the spectrum.
  • Figure 2:
  • Figure 3: (a) Clean and noisy Swiss roll with high-aspect ratio in $\mathbb{R}^3$ colored by the "roll" parameter. (b) Discrepancy between the true and the estimated tangent spaces due to LPCA ($k_{\mathrm{nn}}{} = 9$) and LEGO ($k_{\mathrm{nn}}{} = 9$, ${m_0}{}=100$, ${m}{}=40$), as computed using Eq. \ref{['eq:TBDiscrep']}. (c, d) $2$-dimensional parameterization of the noisy data, and the boundary points detected from the noisy data using the estimated and the true tangent spaces (see Section \ref{['subsec:manifold_learning']} and \ref{['subsec:boundary_detection']} for details) (e) The functional variance explained by each of the three principal directions in LPCA and LEGO (see Section \ref{['subsec:local_intrinsic_dimension']}).
  • Figure 4: (a) Clean and noisy truncated torus in $\mathbb{R}^3$ colored by the noise level. (b) Discrepancy between the true and the estimated tangent spaces due to LPCA ($k_{\mathrm{nn}}{} = 14$) and LEGO ($k_{\mathrm{nn}}{} = 14$, ${m_0}{}=100$, ${m}{}=20$), as computed using Eq. \ref{['eq:TBDiscrep']}. (c, d) $2$-dimensional parameterization of the noisy data, and the boundary points detected from the noisy data using the estimated and the true tangent spaces (see Section \ref{['subsec:manifold_learning']} and \ref{['subsec:boundary_detection']} for details) (e) The functional variance explained by each of the three principal directions in LPCA and LEGO (see Section \ref{['subsec:local_intrinsic_dimension']}).
  • Figure 5: (a) Sample clean images from the Yoda and Bulldog dataset lederman2018learning (first and third columns), along with their noise-perturbed versions (second and fourth columns). (b) Explained variance ratio for the first $30$ principal directions obtained via PCA. As the variance saturates after $10$ dimensions, we project the noisy images into $\mathbb{R}^{10}$ using PCA. (c) Visualization of the noisy data using its first three principal components. The colorbar corresponds to the third component. (d) Two-dimensional torn embeddings of the noisy data using the estimated tangent spaces (see Section \ref{['subsec:manifold_learning']} and ratsv2 for details). (e) The torn $2$d embedding obtained using LEGO estimates, equipped with the gluing instructions that identify the same colored points along the tear, reveals a toroidal topology. The corresponding clean images along the opposite edges further confirm this structure. (f) Functional variance explained by each of the $10$ principal directions obtained from LPCA and LEGO (see Section \ref{['subsec:local_intrinsic_dimension']}).
  • ...and 2 more figures

Theorems & Definitions (24)

  • Lemma 1
  • Lemma 2
  • Theorem 3
  • Corollary 4
  • Remark 5
  • Theorem 6
  • Corollary 7
  • Lemma 8
  • Remark 9
  • Theorem 10
  • ...and 14 more