Manifold Learning with Sparse Regularised Optimal Transport

Stephen Zhang; Gilles Mordant; Tetsuya Matsumoto; Geoffrey Schiebinger

Manifold Learning with Sparse Regularised Optimal Transport

Stephen Zhang, Gilles Mordant, Tetsuya Matsumoto, Geoffrey Schiebinger

TL;DR

The paper develops a manifold learning framework based on a symmetric, quadratically regularised optimal transport (QOT) projection to form a sparse, adaptive affinity matrix that respects latent geometry. It proves that the induced discrete operator converges to a Laplace-type operator, remains robust to heteroskedastic ambient noise, and exhibits a motivating link to nonlinear diffusion via the porous medium equation. Theoretical contributions include finite-sample dual-potential rates, robustness bounds, and convergence results, supplemented by an efficient symmetric semi-smooth Newton solver and an active-set variant for large datasets. Empirically, the method demonstrates superior resilience to noise and competitive performance across manifolds, spectral clustering, MNIST, and single-cell RNA-seq data, outperforming traditional kNN- or entropic-based approaches. This work offers a scalable, geometry-preserving approach to diffusion-based manifold learning with practical implications for high-dimensional data analysis where noise and sampling heterogeneity are prevalent.

Abstract

Manifold learning is a central task in modern statistics and data science. Many datasets (cells, documents, images, molecules) can be represented as point clouds embedded in a high dimensional ambient space, however the degrees of freedom intrinsic to the data are usually far fewer than the number of ambient dimensions. The task of detecting a latent manifold along which the data are embedded is a prerequisite for a wide family of downstream analyses. Real-world datasets are subject to noisy observations and sampling, so that distilling information about the underlying manifold is a major challenge. We propose a method for manifold learning that utilises a symmetric version of optimal transport with a quadratic regularisation that constructs a sparse and adaptive affinity matrix, that can be interpreted as a generalisation of the bistochastic kernel normalisation. We prove that the resulting kernel is consistent with a Laplace-type operator in the continuous limit, establish robustness to heteroskedastic noise and exhibit these results in numerical experiments. We identify a highly efficient computational scheme for computing this optimal transport for discrete data and demonstrate that it outperforms competing methods in a set of examples.

Manifold Learning with Sparse Regularised Optimal Transport

TL;DR

Abstract

Paper Structure (37 sections, 9 theorems, 119 equations, 11 figures, 1 table, 2 algorithms)

This paper contains 37 sections, 9 theorems, 119 equations, 11 figures, 1 table, 2 algorithms.

Introduction
Manifold learning: graphs, affinity matrices, and operators
Affinity matrices
Laplacian operators
Normalisation of the Laplacian matrix
Bistochastic normalisations of affinity matrices
Contributions: from bistochastic normalisations to sparse regularised optimal transport
Bistochastic projections of affinity matrices
Generalised bistochastic information projections
A spectral motivation for projections in Frobenius norm
Geometric view on the optimisation problem
Sketch of the theoretical results
Theoretical contributions
Nearly optimal potentials
Validity of the derived finite sample rate
...and 22 more sections

Key Result

Proposition 1

Consider the heteroskedastic noise model introduced in eq: HetNoise. Denote by $\widetilde{C}$ the squared Euclidean cost matrix based on the corrupted points $\tilde{x}_i$. Then, where $\mathcal{O}_p$ refers to stochastic boundedness.

Figures (11)

Figure 1: Spiral in high dimension with non-uniform noise. (a) (i) Clean points ($N = 1000$) sampled evenly from a closed spiral in 3 dimensions. (ii) Points embedded in $d = 250$ dimensions, subjected to non-uniform noise in the ambient high-dimensional space. (b) Laplacian eigenspace angles between a ground truth reference Laplacian and Laplacians obtained using various choices of affinity matrix construction and bandwidth parameter ($\varepsilon$) choices. (c) Best 2-dimensional Laplacian eigenfunction embeddings found for each affinity matrix construction. (d) (i) Interpolation between two measures $(\mu_0, \mu_1)$ supported on the spiral, computed as Sinkhorn barycenters using a ground cost induced by different affinity matrix constructions. (ii) Error (measured in terms of the Sinkhorn divergence between distributions) of measure interpolations with respect to a ground truth interpolation. (e) For fixed parameters, the spectral embedding obtained using the bistochastic Frobenius projection (QOT) and $k$-nearest neighbours ($k$-NN) with the top $d$ PCA coordinates as $d$ ranges from $5$ to $250$.
Figure 2: Gaussian mixture model. (a) (i) $N = 1500$ points sampled from a mixture of 3 Gaussians (500 points each) in $d = 50$ dimensions, shown in PCA coordinates and coloured by their true labels. (ii) Distribution of effective neighbourhood sizes (perplexities) for QOT (Frobenius) and EOT (entropic) bistochastic projections, where the mean perplexity is 30 in both cases. (iii) Corresponding affinity matrices. (b) (i-ii) Leading eigenvectors of the graph Laplacian constructed from each affinity matrix. (c) Performance for varying $(d, \varepsilon)$ measured in terms of (i) normalised mutual information (NMI) of spectral clustering result and (ii) eigenspace angle of leading 6 eigenvectors to template subspace. (d) Performance for different numbers of principal components with fixed parameters $\varepsilon$, $k$.
Figure 3: MNIST dataset. (a) (i) $N = 750$ MNIST sampled from 1, 2, 7, 9 (250 images each) in $d = 784$ dimensions, shown in PCA coordinates and coloured by their true labels. (ii) Same as in Figure \ref{['fig:gmm']}(a)(ii) but with mean perplexity 13. (iii) Corresponding affinity matrices. (b) Same as in Figure \ref{['fig:gmm']}(b). (c) Performance measured in terms of (i) normalised mutual information (NMI) of spectral clustering result and (ii) eigenspace angle of leading 8 eigenvectors to template subspace. (d) (i-ii) MDS layouts of leading 8 diffusion components and corresponding eigenvectors, for L2 and entropic bistochastic projections. (iii) MDS layout of diffusion components for 1 digits recovers a 1-dimensional manifold corresponding to a rotation for bistochastic L2 projection, while this is corrupted in the entropic case.
Figure 4: Single cell RNA sequencing dataset. (a) Sub-sampled dentate gyrus dataset la2018rna (5000 cells, 2239 dimensions) visualised in $t$-SNE coordinates from the original publication, and coloured by celltype annotation. (b) L2 and entropic bistochastic projections of the linear kernel, ordered by celltype annotation. (c) Distribution of effective neighbourhood sizes (perplexities) for L2 and entropic bistochastic projections. (d) Left: Cells coloured by squared distance from origin cell (red) measured in terms of the 10-dimensional spectral embedding, for L2 and entropic bistochastic projections as well as $k$-NN affinity matrix constructions. Right: terminal cell states predicted by la2018rna using RNA velocity estimates. (e) Distribution of Laplacian eigenvalues for L2 and entropic bistochastic projection. Inset shows occurence of leading eigenvalues in the range $[0, 0.07]$. (f) Same as (d) but shown for Granule and CA branches, together with the Pearson correlation between spectral embedding distance to origin and RNA velocity end state probability.
Figure 5: Simulated scRNA-seq batch effects. (a) 2000 cells sampled from Granule trajectory with simulated batch effect. (b) Dual potentials from QOT and EOT. (c) Affinity matrices for various constructions, coloured by celltype and batch. (d) $t$-SNE embeddings obtained using different affinity constructions.
...and 6 more figures

Theorems & Definitions (17)

Proposition 1
Remark 2: Convergence rates
Remark 5: No loss of generality in choosing $X_0$
Lemma 6
Remark 7
Remark 8
Remark 9
Lemma 10
Remark 11: Extension to RKHS
Theorem 12
...and 7 more

Manifold Learning with Sparse Regularised Optimal Transport

TL;DR

Abstract

Manifold Learning with Sparse Regularised Optimal Transport

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (17)