How well behaved is finite dimensional Diffusion Maps?
Wenyu Bo, Marina Meilă
TL;DR
This work analyzes finite-dimensional Diffusion Maps (DM) of a well-behaved family of manifolds embedded in Euclidean space and establishes that DM preserves key geometric properties such as near-uniform density, uniform polynomial reach, and controlled curvature. By leveraging these properties, the authors derive a finite-sample embedding error bound $O\left((\frac{\log n}{n})^{\frac{1}{8d+16}}\right)$ and a tangent-space estimation bound $\sup_{P\in \mathcal{P}} \mathbb{E}_{P^{\otimes \tilde{n}}} \max_{1\le j\le \tilde{n}} \angle\left(T_{Y_{\varphi(M),j}}\varphi(M),\hat{T}_j\right)\le C\left(\frac{\log n}{n}\right)^{\frac{k-1}{(8d+16)k}}$, where $\tilde{n}$ scales judiciously with $n$. The analysis ties together spectral convergence of DM, pushforward density regularity, and geometric quantities like reach and injectivity radius, culminating in a rigorous description of the geometric accuracy of DM embeddings. The results provide a theoretical foundation for reliable DM-based dimensionality reduction and tangent-space estimation in applications, with explicit rates depending on the manifold dimension $d$, smoothness $k$, and sample size $n$. Overall, the paper advances the understanding of DM stability in finite-sample regimes and informs practical choices of embedding dimension and sample allocation.
Abstract
Under a set of assumptions on a family of submanifolds $\subset {\mathbb R}^D$, we derive a series of geometric properties that remain valid after finite-dimensional and almost isometric Diffusion Maps (DM), including almost uniform density, finite polynomial approximation and reach. Leveraging these properties, we establish rigorous bounds on the embedding errors introduced by the DM algorithm is $O\left((\frac{\log n}{n})^{\frac{1}{8d+16}}\right)$. Furthermore, we quantify the error between the estimated tangent spaces and the true tangent spaces over the submanifolds after the DM embedding, $\sup_{P\in \mathcal{P}}\mathbb{E}_{P^{\otimes \tilde{n}}} \max_{1\leq j \angle (T_{Y_{\varphi(M),j}}\varphi(M),\hat{T}_j)\leq \tilde{n}} \leq C \left(\frac{\log n }{n}\right)^\frac{k-1}{(8d+16)k}$, which providing a precise characterization of the geometric accuracy of the embeddings. These results offer a solid theoretical foundation for understanding the performance and reliability of DM in practical applications.
