Relating tSNE and UMAP to Classical Dimensionality Reduction

Andrew Draganov; Simon Dohn

Relating tSNE and UMAP to Classical Dimensionality Reduction

Andrew Draganov, Simon Dohn

TL;DR

This work investigates the explainability gap of gradient-based DR methods (tSNE/UMAP) by connecting them to classical techniques. It formalizes an attraction/repulsion (ARDR) framework and shows that PCA, MDS, and Isomap can be recovered within this paradigm by applying attractions/repulsions on a randomly initialized dataset, with PCA gradient expressed as $ abla^{PCA} = -4 C(G_X - G_Y) C Y$. The authors further demonstrate that UMAP can be reproduced using classical DR via double-kernel LLE (DK-LLE) objectives (DK-LLE with two kernels) and prove strong empirical and theoretical links between UMAP and DK-LLE, including a shared neighborhood preservation behavior. They propose that UMAP embeddings implicitly preserve local neighborhoods under the input/output kernels, and they provide a concrete conjecture that UMAP achieves a constant-factor approximation to the DK-LLE objective, offering a pathway to interpretable explanations for UMAP outputs. The work highlights practical implications for interpreting embeddings and suggests future directions to relate modern ARDR methods to classical, explainable DR techniques, potentially enabling rigorous guarantees for the high-dimensional structures they summarize.

Abstract

It has become standard to use gradient-based dimensionality reduction (DR) methods like tSNE and UMAP when explaining what AI models have learned. This makes sense: these methods are fast, robust, and have an uncanny ability to find semantic patterns in high-dimensional data without supervision. Despite this, gradient-based DR methods lack the most important quality that an explainability method should possess: themselves being explainable. That is, given a UMAP output, it is currently unclear what one can say about the corresponding input. We work towards closing this question by relating UMAP to classical DR techniques. Specifically, we show that one can fully recover methods like PCA, MDS, and ISOMAP in the modern DR paradigm: by applying attractions and repulsions onto a randomly initialized dataset. We also show that, with a small change, Locally Linear Embeddings (LLE) can indistinguishably reproduce UMAP outputs. This implies that the UMAP effective objective is minimized by this modified version of LLE (and vice versa). Given this, we discuss what must be true of UMAP emebddings and present avenues for future work.

Relating tSNE and UMAP to Classical Dimensionality Reduction

TL;DR

. The authors further demonstrate that UMAP can be reproduced using classical DR via double-kernel LLE (DK-LLE) objectives (DK-LLE with two kernels) and prove strong empirical and theoretical links between UMAP and DK-LLE, including a shared neighborhood preservation behavior. They propose that UMAP embeddings implicitly preserve local neighborhoods under the input/output kernels, and they provide a concrete conjecture that UMAP achieves a constant-factor approximation to the DK-LLE objective, offering a pathway to interpretable explanations for UMAP outputs. The work highlights practical implications for interpreting embeddings and suggests future directions to relate modern ARDR methods to classical, explainable DR techniques, potentially enabling rigorous guarantees for the high-dimensional structures they summarize.

Abstract

Paper Structure (35 sections, 5 theorems, 35 equations, 8 figures)

This paper contains 35 sections, 5 theorems, 35 equations, 8 figures.

Introduction
Our Contributions
Preliminaries and Related Work
Classical Methods
PCA, MDS and Isomap.
Locally Linear Embedding (LLE).
Gradient Dimensionality Reduction Methods
In theory
ARDR methods
Generalization
In practice
Related Work
Classical Methods in the ARDR Framework
PCA as Attractions and Repulsions
PCA Convergence
...and 20 more sections

Key Result

Lemma 3.1

The minimum of $\mathcal{L}^{PCA}(\mathbf{X}\xspace, \mathbf{Y}\xspace) = || \mathbf{C}\xspace ( \mathbf{G}_X\xspace - \mathbf{G}_Y\xspace) \mathbf{C}\xspace ||_F^2$ is only obtained when $\mathbf{Y}\xspace$ is the PCA projection of $\mathbf{X}\xspace$ up to orthogonal transformation.

Figures (8)

Figure 1: Examples of UMAP outputs behaving unintuitively: (a) they can remain stable despite the input's structure changing; (b) they can change despite the input's structure remaining stable.
Figure 2: Experimental verification of convergence for PCA, classical MDS and ISOMAP. We show the embeddings for each DR technique using the default method and via gradient descent on the points. We use the $L_1$-distance for the Classical MDS setting.
Figure 3: DK-PCA embeddings on the MNIST, Fashion-MNIST, Swiss-Roll and Iris datasets.
Figure 4: Embeddings under double-kernel optimization paradigms. kNN-classifier accuracy listed under each plot.
Figure 5: Normalized values of Eq. (\ref{['eq:dklle_loss']}) as we optimize via UMAP (green) and standard gradient descent (blue). Both methods use the same learning rate.
...and 3 more figures

Theorems & Definitions (8)

Lemma 3.1
Corollary 3.2
Lemma 3.3
Theorem 3.4
Proposition 4.1
proof
proof
proof

Relating tSNE and UMAP to Classical Dimensionality Reduction

TL;DR

Abstract

Relating tSNE and UMAP to Classical Dimensionality Reduction

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (8)