Table of Contents
Fetching ...

Manifold learning: what, how, and why

Marina Meilă, Hanyu Zhang

TL;DR

This survey synthesizes the mathematical and statistical foundations of manifold learning, detailing how neighborhood graphs, local linear approximations, and embedding algorithms reveal low-dimensional manifold structure in high-dimensional data. It contrasts one-shot spectral methods (Isomap, Diffusion Maps, Laplacian Eigenmaps, LTSA) with relaxation-based neighbor embeddings (t-SNE, UMAP), highlighting their guarantees, limitations, and susceptibility to distortions like the REP. The work emphasizes the role of the Laplace-Beltrami operator, intrinsic dimension estimation, and scale selection as core statistical challenges, and discusses practical guidance for applications in statistics and the sciences. Overall, it frames manifold learning as a principled toolkit for visualization, regularization, and scientific discovery, while acknowledging existing gaps in isometric embedding and robust dimension inference.

Abstract

Manifold learning (ML), known also as non-linear dimension reduction, is a set of methods to find the low dimensional structure of data. Dimension reduction for large, high dimensional data is not merely a way to reduce the data; the new representations and descriptors obtained by ML reveal the geometric shape of high dimensional point clouds, and allow one to visualize, de-noise and interpret them. This survey presents the principles underlying ML, the representative methods, as well as their statistical foundations from a practicing statistician's perspective. It describes the trade-offs, and what theory tells us about the parameter and algorithmic choices we make in order to obtain reliable conclusions.

Manifold learning: what, how, and why

TL;DR

This survey synthesizes the mathematical and statistical foundations of manifold learning, detailing how neighborhood graphs, local linear approximations, and embedding algorithms reveal low-dimensional manifold structure in high-dimensional data. It contrasts one-shot spectral methods (Isomap, Diffusion Maps, Laplacian Eigenmaps, LTSA) with relaxation-based neighbor embeddings (t-SNE, UMAP), highlighting their guarantees, limitations, and susceptibility to distortions like the REP. The work emphasizes the role of the Laplace-Beltrami operator, intrinsic dimension estimation, and scale selection as core statistical challenges, and discusses practical guidance for applications in statistics and the sciences. Overall, it frames manifold learning as a principled toolkit for visualization, regularization, and scientific discovery, while acknowledging existing gaps in isometric embedding and robust dimension inference.

Abstract

Manifold learning (ML), known also as non-linear dimension reduction, is a set of methods to find the low dimensional structure of data. Dimension reduction for large, high dimensional data is not merely a way to reduce the data; the new representations and descriptors obtained by ML reveal the geometric shape of high dimensional point clouds, and allow one to visualize, de-noise and interpret them. This survey presents the principles underlying ML, the representative methods, as well as their statistical foundations from a practicing statistician's perspective. It describes the trade-offs, and what theory tells us about the parameter and algorithmic choices we make in order to obtain reliable conclusions.
Paper Structure (30 sections, 13 equations, 6 figures, 1 table, 4 algorithms)

This paper contains 30 sections, 13 equations, 6 figures, 1 table, 4 algorithms.

Figures (6)

  • Figure 1: Left: The ethanol molecule has 9 atoms; a spatial configuration of ethanol has $D=3\times 9$ dimensions. The CH$_3$ group (atoms 2,6,7,8) and the OH group (atoms 3,9) can rotate w.r.t. the middle group (atoms 1,4,5), and the blue and orange lines represent these angles of rotation. Right: A 2-manifold estimated from 50,000 configurations of the ethanol molecule. The manifold has the topology of a torus, and the color represents the rotation of the OH group. The sharp "corners" are distortions introduced by the embedding algorithm (explained in Section \ref{['sec:graph']}). Figure \ref{['fig:graph-density-effects']} shows the original data. This dataset is from chmielaTkaSauceSchuPMull:force-fields17
  • Figure 2: Embedding algorithms failing to find a full rank mapping, if they greedily select the first $m=2$ eigenvectors, and correction by a more refined choice of eigenvectors. Top row: Embeddings of galaxy spectra from the SDSS (Section \ref{['sec:appli']}) by DM ; middle "horseshoe" when first 2 eigenvectors are used; right the same data, with selection of the second eigenvector (in this case by yuchaz). Bottom row: embeddings of a swiss roll with length 7 times the width. Left: first 2 eigenvectors from DM/ LE; middle after UMAP. Note that UMAP by itself is not able to produce a full-rank embedding everywhere; the horseshoe, the two clusters, and the 1 dimensional "filament" between are all artifacts. Right: UMAP with selection of the second eigenvector by yuchaz. Plots by Yu-Chia Chen.
  • Figure 3: Embedding obtained from the algorithms in this section on a chopped torus data set with $n=14,519$ points. This manifold cannot be embedded isometrically in $d=2$ dimensions.
  • Figure 4: Effects of graph construction and renormalization, when the sampling density is highly non-uniform, exemplified on the configurations of the ethanol molecule. Left: original data, after preprocessing, is a noisy torus, with three regions of high density, around local minima of the potential energy. Center: Embeddings by DM (purple), and by the same algorithm with $\mathbf{L}$ constructed from the $k$-nearest neighbor graph (yellow). The low sparse regions are stretched, while the dense regions appear like "corners" of the embedding. Note that DMshould remove the effects of the density; in this case, the variations in density are so extreme that the effect persists. The effect is somewhat stronger for the $k$-nearest neighbor graph. Right: Embedding by DM (purple) and by LE (yellow), which uses the singly normalized $\mathbf{L}^{rw}$.
  • Figure 5: The embeddings from Figure \ref{['fig:chopped_torus']}, with the distortion $\tilde{\mathbf{h}}$ estimated at a random subset of points.
  • ...and 1 more figures