The Information Geometry of UMAP

Alexander Kolpakov; Aidan Rocke

The Information Geometry of UMAP

Alexander Kolpakov, Aidan Rocke

TL;DR

This paper reframes UMAP through the lens of Information Geometry, linking its local probabilistic construction to geometric notions on manifolds and KL-based divergences. It clarifies how conformal rescaling and a uniformity assumption underpin the high- to low-dimensional embedding via a cross-entropy (equivalently KL) objective, and discusses the role of probabilistic kNN-graphs and kernel choices in shaping the learned geometry. It also proposes topological extensions using Vietoris–Rips complexes to capture multi-scale structure and persistence, potentially enriching the embedding with topological guarantees. The work provides a principled theoretical foundation for UMAP, highlighting its connections to Fisher metrics and suggesting practical avenues for incorporating topology into manifold learning.

Abstract

In this note we highlight some connections of UMAP to the basic principles of Information Geometry. Originally, UMAP was derived from Category Theory observations. However, we posit that it also has a natural geometric interpretation.

The Information Geometry of UMAP

TL;DR

Abstract

Paper Structure (17 sections, 27 equations, 2 figures, 3 tables)

This paper contains 17 sections, 27 equations, 2 figures, 3 tables.

Introduction
Conformal rescaling.
High--dimensional probabilities.
Low--dimensional probabilities.
Cross-entropy minimisation.
Algorithm implementation.
The Information Geometry of UMAP
Uniformity assumption
High--dimensional probabilities
Probabilistic $kNN$--graphs
Probability kernels
Low--dimensional probabilities
On the equivalence of cross--entropy and KL--divergence
Future research: Vietoris--Rips complexes
Conclusions
...and 2 more sections

Figures (2)

Figure 1: The uniform point distribution that follows from the Veselov--Shabat KdV equation: both the $3$D projection (left) and the UMAP embedding (right) resemble $2$D surfaces.
Figure 2: A non--uniform distribution: the projection still looks like a $2$D surface, but the UMAP embedding is essentially $1$D.

The Information Geometry of UMAP

TL;DR

Abstract

The Information Geometry of UMAP

Authors

TL;DR

Abstract

Table of Contents

Figures (2)