Stochastic neighborhood embedding and the gradient flow of relative entropy

Ben Weinkove

Stochastic neighborhood embedding and the gradient flow of relative entropy

Ben Weinkove

TL;DR

This work formalizes SNE and t-SNE as a gradient flow of the relative entropy between fixed high-dimensional similarities $p_{ij}$ and low-dimensional similarities $q_{ij}$ defined via a function $\beta$. It derives the gradient flow, proves existence, and establishes sharp long-time diameter bounds: $\mathrm{diam}\,Y(t)\le C\,t^{1/4}$ for t-SNE and $\mathrm{diam}\,Y(t)\le C$ for SNE, with optimality demonstrated by explicit constructions. Under natural convexity and growth conditions on $\gamma(x)=1/\beta(x)$, either the diameter remains bounded or a rescaled limit $Y_\infty$ with $n$ distinct points emerges, highlighting a fundamental difference between t-SNE and SNE related to the crowding problem. The paper couples rigorous ODE analysis with illustrative examples and raises open questions about convergence rates and the structure of limit configurations, advancing the theoretical understanding of these widely used dimensionality reduction methods.

Abstract

Dimension reduction, widely used in science, maps high-dimensional data into low-dimensional space. We investigate a basic mathematical model underlying the techniques of stochastic neighborhood embedding (SNE) and its popular variant t-SNE. Distances between points in high dimensions are used to define a probability distribution on pairs of points, measuring how similar the points are. The aim is to map these points to low dimensions in an optimal way so that similar points are closer together. This is carried out by minimizing the relative entropy between two probability distributions. We consider the gradient flow of the relative entropy and analyze its long-time behavior. This is a self-contained mathematical problem about the behavior of a system of nonlinear ordinary differential equations. We find optimal bounds for the diameter of the evolving sets as time tends to infinity. In particular, the diameter may blow up for the t-SNE version, but remains bounded for SNE.

Stochastic neighborhood embedding and the gradient flow of relative entropy

TL;DR

This work formalizes SNE and t-SNE as a gradient flow of the relative entropy between fixed high-dimensional similarities

and low-dimensional similarities

defined via a function

. It derives the gradient flow, proves existence, and establishes sharp long-time diameter bounds:

for t-SNE and

for SNE, with optimality demonstrated by explicit constructions. Under natural convexity and growth conditions on

, either the diameter remains bounded or a rescaled limit

with

distinct points emerges, highlighting a fundamental difference between t-SNE and SNE related to the crowding problem. The paper couples rigorous ODE analysis with illustrative examples and raises open questions about convergence rates and the structure of limit configurations, advancing the theoretical understanding of these widely used dimensionality reduction methods.

Abstract

Paper Structure (9 sections, 6 theorems, 68 equations)

This paper contains 9 sections, 6 theorems, 68 equations.

Introduction
The SNE and t-SNE algorithms
The gradient flow of relative entropy
Results
SNE versus t-SNE
The gradient flow of relative entropy
Proofs of the main results
Examples
Further questions

Key Result

Theorem 1.1

The following diameter bounds hold.

Theorems & Definitions (17)

Remark 1.1
Theorem 1.1
Theorem 1.2
Theorem 1.3
Theorem 1.4
Proposition 3.1
Lemma 3.1
proof
proof : Proof of Proposition \ref{['prop']}
proof : Proof of Theorem \ref{['mainthm0']}
...and 7 more

Stochastic neighborhood embedding and the gradient flow of relative entropy

TL;DR

Abstract

Stochastic neighborhood embedding and the gradient flow of relative entropy

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (17)