Table of Contents
Fetching ...

Towards One Model for Classical Dimensionality Reduction: A Probabilistic Perspective on UMAP and t-SNE

Aditya Ravuri, Neil D. Lawrence

TL;DR

The paper addresses the lack of a unified probabilistic understanding of popular dimensionality reduction methods like UMAP and t-SNE. It proposes recasting these algorithms as maximum-a-posteriori in a Wishart model of the graph Laplacian, where latent coordinates define a non-linear kernel-based covariance, linking ProbDR with Gaussian process latent variable models. Key contributions include a simplified ProbDR framework, a concrete distributional interpretation for UMAP/t-SNE via a non-linear kernel, and demonstrated connections to Laplacian Eigenmaps and GPLVMs. This work provides theoretical grounding for DR methods, enables principled incorporation of prior information, and offers a path toward kernel-informed, scalable embeddings.

Abstract

This paper shows that dimensionality reduction methods such as UMAP and t-SNE, can be approximately recast as MAP inference methods corresponding to a model introduced in Ravuri et al. (2023), that describes the graph Laplacian (an estimate of the data precision matrix) using a Wishart distribution, with a mean given by a non-linear covariance function evaluated on the latents. This interpretation offers deeper theoretical and semantic insights into such algorithms, and forging a connection to Gaussian process latent variable models by showing that well-known kernels can be used to describe covariances implied by graph Laplacians. We also introduce tools with which similar dimensionality reduction methods can be studied.

Towards One Model for Classical Dimensionality Reduction: A Probabilistic Perspective on UMAP and t-SNE

TL;DR

The paper addresses the lack of a unified probabilistic understanding of popular dimensionality reduction methods like UMAP and t-SNE. It proposes recasting these algorithms as maximum-a-posteriori in a Wishart model of the graph Laplacian, where latent coordinates define a non-linear kernel-based covariance, linking ProbDR with Gaussian process latent variable models. Key contributions include a simplified ProbDR framework, a concrete distributional interpretation for UMAP/t-SNE via a non-linear kernel, and demonstrated connections to Laplacian Eigenmaps and GPLVMs. This work provides theoretical grounding for DR methods, enables principled incorporation of prior information, and offers a path toward kernel-informed, scalable embeddings.

Abstract

This paper shows that dimensionality reduction methods such as UMAP and t-SNE, can be approximately recast as MAP inference methods corresponding to a model introduced in Ravuri et al. (2023), that describes the graph Laplacian (an estimate of the data precision matrix) using a Wishart distribution, with a mean given by a non-linear covariance function evaluated on the latents. This interpretation offers deeper theoretical and semantic insights into such algorithms, and forging a connection to Gaussian process latent variable models by showing that well-known kernels can be used to describe covariances implied by graph Laplacians. We also introduce tools with which similar dimensionality reduction methods can be studied.
Paper Structure (9 sections, 1 theorem, 30 equations, 3 figures)

This paper contains 9 sections, 1 theorem, 30 equations, 3 figures.

Key Result

theorem 1

Assume that $\mathbf{Y}$ is distributed as, Then, the following hold. Firstly, denoting $d_{ij}^2 = \| \mathbf{Y}_i - \mathbf{Y}_j\|^2$, the marginal distribution is given by, As a consequence, $\mathbb E(d_{ij}^2) = d * \Tilde{k}_{ij} \text{ and } \mathbb V(d_{ij}^2) = 2d * \Tilde{k}_{ij}^2$, where $\Tilde{k}_{ij} = k_{ii} + k_{jj} - 2k_{ij}$. Additionally, This is a useful fact as the upper t

Figures (3)

  • Figure 1: Comparison between embeddings obtained using the CNE objective (top) and our inference (bottom). From left to right: MNIST digits, transcriptomic data from macosko, and larger-scale transcriptomic data from zheng. \ref{['app:gplvm']} shows that PCA and GPLVMs with a similar kernel do not produce similar embeddings, presumably due to the Laplacian encoding different statistics of the data w.r.t. the empirical covariance.
  • Figure 2: MNIST digits embedded using PCA (left), GPLVM using a linear + constant + t + noise kernels, with the inits scaled towards zero (center), and the same GPLVM with unscaled PCA inits (right). In each case, the GPLVM hyperparameters were first "pre-trained" using the PCA-initialized embeddings for 10 epochs, and the embeddings were trained for a further 40. In every case, note the visual dissimilarity w.r.t. our versions of UMAP/t-SNE.
  • Figure 3: MNIST digits embedded using our probabilistic interpretation derived using neg-t-SNE.

Theorems & Definitions (1)

  • theorem 1: Distribution of normal distances