Table of Contents
Fetching ...

Inductive Global and Local Manifold Approximation and Projection

Jungeum Kim, Xiao Wang

TL;DR

GLoMAP unifies global and local manifold learning by constructing a locally adaptive global distance from \\hat{d}_{loc} using KNN scales and then merging via shortest-path on a graph, with a tempering schedule that reveals global structure before local details. The inductive variant iGLoMAP adds a mapper $Q_\theta$ to produce embeddings for unseen data, trained with a particle-based scheme that preserves transductive stability while enabling generalization. Theoretical results guarantee the local distance estimator is consistent with the geodesic distance on a manifold, and the global metric space can be formed as an extended metric via a coequalizer-like construction. Empirically, GLoMAP and iGLoMAP achieve competitive performance on simulated and real datasets (e.g., MNIST, Spheres, hierarchical data), demonstrating strong global-to-local structure preservation and scalable inductive embeddings for large-scale DR tasks.

Abstract

Nonlinear dimensional reduction with the manifold assumption, often called manifold learning, has proven its usefulness in a wide range of high-dimensional data analysis. The significant impact of t-SNE and UMAP has catalyzed intense research interest, seeking further innovations toward visualizing not only the local but also the global structure information of the data. Moreover, there have been consistent efforts toward generalizable dimensional reduction that handles unseen data. In this paper, we first propose GLoMAP, a novel manifold learning method for dimensional reduction and high-dimensional data visualization. GLoMAP preserves locally and globally meaningful distance estimates and displays a progression from global to local formation during the course of optimization. Furthermore, we extend GLoMAP to its inductive version, iGLoMAP, which utilizes a deep neural network to map data to its lower-dimensional representation. This allows iGLoMAP to provide lower-dimensional embeddings for unseen points without needing to re-train the algorithm. iGLoMAP is also well-suited for mini-batch learning, enabling large-scale, accelerated gradient calculations. We have successfully applied both GLoMAP and iGLoMAP to the simulated and real-data settings, with competitive experiments against the state-of-the-art methods.

Inductive Global and Local Manifold Approximation and Projection

TL;DR

GLoMAP unifies global and local manifold learning by constructing a locally adaptive global distance from \\hat{d}_{loc} using KNN scales and then merging via shortest-path on a graph, with a tempering schedule that reveals global structure before local details. The inductive variant iGLoMAP adds a mapper to produce embeddings for unseen data, trained with a particle-based scheme that preserves transductive stability while enabling generalization. Theoretical results guarantee the local distance estimator is consistent with the geodesic distance on a manifold, and the global metric space can be formed as an extended metric via a coequalizer-like construction. Empirically, GLoMAP and iGLoMAP achieve competitive performance on simulated and real datasets (e.g., MNIST, Spheres, hierarchical data), demonstrating strong global-to-local structure preservation and scalable inductive embeddings for large-scale DR tasks.

Abstract

Nonlinear dimensional reduction with the manifold assumption, often called manifold learning, has proven its usefulness in a wide range of high-dimensional data analysis. The significant impact of t-SNE and UMAP has catalyzed intense research interest, seeking further innovations toward visualizing not only the local but also the global structure information of the data. Moreover, there have been consistent efforts toward generalizable dimensional reduction that handles unseen data. In this paper, we first propose GLoMAP, a novel manifold learning method for dimensional reduction and high-dimensional data visualization. GLoMAP preserves locally and globally meaningful distance estimates and displays a progression from global to local formation during the course of optimization. Furthermore, we extend GLoMAP to its inductive version, iGLoMAP, which utilizes a deep neural network to map data to its lower-dimensional representation. This allows iGLoMAP to provide lower-dimensional embeddings for unseen points without needing to re-train the algorithm. iGLoMAP is also well-suited for mini-batch learning, enabling large-scale, accelerated gradient calculations. We have successfully applied both GLoMAP and iGLoMAP to the simulated and real-data settings, with competitive experiments against the state-of-the-art methods.
Paper Structure (43 sections, 6 theorems, 46 equations, 28 figures, 3 algorithms)

This paper contains 43 sections, 6 theorems, 46 equations, 28 figures, 3 algorithms.

Key Result

Proposition 1

An unbiased estimator of $\mathcal{L}_{\rm glo}(Z)$ up to a constant multiplication is where $\mu_{i\cdot}=\sum_{j=1}^n \mu_{ij}$ and $S$ is a uniformly sampled index set from $I=\{1,...,n\}$ and $j_i$ is sampled from a conditional distribution $\mu_{j\vert i}=\frac{\mu_{ij}}{\sum_{j=1}^n \mu_{ij}}.$

Figures (28)

  • Figure 1: The visualization of the spheres dataset topoAE by UMAP and GLoMAP (both transductive). Bottom: GLoMAP shows a progression of the representation from global to local during the optimization. All ten inner clusters are identified as well as the larger cluster on the outer shell (purple).
  • Figure 2: The transductive visualization of the hierarchical dataset pac_map. Top: The results of UMAP. Bottom: From the random initialization, GLoMAP finds first the macro, then meso and then all the micro clusters a progressive way during the optimization.
  • Figure 3: The visualization and generalization performance of iGLoMAP on the hierarchical and spheres dataset. Spheres: All ten inner clusters are identified with the outer shell (purple) scatters around. Hierarchical: all levels of clusters are identified.
  • Figure 4: The effect of equivalence relation as a coequalizer
  • Figure 5: The 3D datasets are generated from the displayed 2D rectangles. Left: S-curve dataset, Middle: Severed Sphere dataset, Right: Eggs dataset.
  • ...and 23 more figures

Theorems & Definitions (12)

  • Proposition 1
  • Example 1
  • Remark 1
  • Remark 2
  • Example 2
  • Remark 3
  • Theorem 1
  • Theorem 2
  • Proposition 2
  • Theorem 3
  • ...and 2 more