Table of Contents
Fetching ...

Manifold Learning with Sparse Regularised Optimal Transport

Stephen Zhang, Gilles Mordant, Tetsuya Matsumoto, Geoffrey Schiebinger

TL;DR

The paper develops a manifold learning framework based on a symmetric, quadratically regularised optimal transport (QOT) projection to form a sparse, adaptive affinity matrix that respects latent geometry. It proves that the induced discrete operator converges to a Laplace-type operator, remains robust to heteroskedastic ambient noise, and exhibits a motivating link to nonlinear diffusion via the porous medium equation. Theoretical contributions include finite-sample dual-potential rates, robustness bounds, and convergence results, supplemented by an efficient symmetric semi-smooth Newton solver and an active-set variant for large datasets. Empirically, the method demonstrates superior resilience to noise and competitive performance across manifolds, spectral clustering, MNIST, and single-cell RNA-seq data, outperforming traditional kNN- or entropic-based approaches. This work offers a scalable, geometry-preserving approach to diffusion-based manifold learning with practical implications for high-dimensional data analysis where noise and sampling heterogeneity are prevalent.

Abstract

Manifold learning is a central task in modern statistics and data science. Many datasets (cells, documents, images, molecules) can be represented as point clouds embedded in a high dimensional ambient space, however the degrees of freedom intrinsic to the data are usually far fewer than the number of ambient dimensions. The task of detecting a latent manifold along which the data are embedded is a prerequisite for a wide family of downstream analyses. Real-world datasets are subject to noisy observations and sampling, so that distilling information about the underlying manifold is a major challenge. We propose a method for manifold learning that utilises a symmetric version of optimal transport with a quadratic regularisation that constructs a sparse and adaptive affinity matrix, that can be interpreted as a generalisation of the bistochastic kernel normalisation. We prove that the resulting kernel is consistent with a Laplace-type operator in the continuous limit, establish robustness to heteroskedastic noise and exhibit these results in numerical experiments. We identify a highly efficient computational scheme for computing this optimal transport for discrete data and demonstrate that it outperforms competing methods in a set of examples.

Manifold Learning with Sparse Regularised Optimal Transport

TL;DR

The paper develops a manifold learning framework based on a symmetric, quadratically regularised optimal transport (QOT) projection to form a sparse, adaptive affinity matrix that respects latent geometry. It proves that the induced discrete operator converges to a Laplace-type operator, remains robust to heteroskedastic ambient noise, and exhibits a motivating link to nonlinear diffusion via the porous medium equation. Theoretical contributions include finite-sample dual-potential rates, robustness bounds, and convergence results, supplemented by an efficient symmetric semi-smooth Newton solver and an active-set variant for large datasets. Empirically, the method demonstrates superior resilience to noise and competitive performance across manifolds, spectral clustering, MNIST, and single-cell RNA-seq data, outperforming traditional kNN- or entropic-based approaches. This work offers a scalable, geometry-preserving approach to diffusion-based manifold learning with practical implications for high-dimensional data analysis where noise and sampling heterogeneity are prevalent.

Abstract

Manifold learning is a central task in modern statistics and data science. Many datasets (cells, documents, images, molecules) can be represented as point clouds embedded in a high dimensional ambient space, however the degrees of freedom intrinsic to the data are usually far fewer than the number of ambient dimensions. The task of detecting a latent manifold along which the data are embedded is a prerequisite for a wide family of downstream analyses. Real-world datasets are subject to noisy observations and sampling, so that distilling information about the underlying manifold is a major challenge. We propose a method for manifold learning that utilises a symmetric version of optimal transport with a quadratic regularisation that constructs a sparse and adaptive affinity matrix, that can be interpreted as a generalisation of the bistochastic kernel normalisation. We prove that the resulting kernel is consistent with a Laplace-type operator in the continuous limit, establish robustness to heteroskedastic noise and exhibit these results in numerical experiments. We identify a highly efficient computational scheme for computing this optimal transport for discrete data and demonstrate that it outperforms competing methods in a set of examples.
Paper Structure (37 sections, 9 theorems, 119 equations, 11 figures, 1 table, 2 algorithms)

This paper contains 37 sections, 9 theorems, 119 equations, 11 figures, 1 table, 2 algorithms.

Key Result

Proposition 1

Consider the heteroskedastic noise model introduced in eq: HetNoise. Denote by $\widetilde{C}$ the squared Euclidean cost matrix based on the corrupted points $\tilde{x}_i$. Then, where $\mathcal{O}_p$ refers to stochastic boundedness.

Figures (11)

  • Figure 1: Spiral in high dimension with non-uniform noise. (a) (i) Clean points ($N = 1000$) sampled evenly from a closed spiral in 3 dimensions. (ii) Points embedded in $d = 250$ dimensions, subjected to non-uniform noise in the ambient high-dimensional space. (b) Laplacian eigenspace angles between a ground truth reference Laplacian and Laplacians obtained using various choices of affinity matrix construction and bandwidth parameter ($\varepsilon$) choices. (c) Best 2-dimensional Laplacian eigenfunction embeddings found for each affinity matrix construction. (d) (i) Interpolation between two measures $(\mu_0, \mu_1)$ supported on the spiral, computed as Sinkhorn barycenters using a ground cost induced by different affinity matrix constructions. (ii) Error (measured in terms of the Sinkhorn divergence between distributions) of measure interpolations with respect to a ground truth interpolation. (e) For fixed parameters, the spectral embedding obtained using the bistochastic Frobenius projection (QOT) and $k$-nearest neighbours ($k$-NN) with the top $d$ PCA coordinates as $d$ ranges from $5$ to $250$.
  • Figure 2: Gaussian mixture model. (a) (i) $N = 1500$ points sampled from a mixture of 3 Gaussians (500 points each) in $d = 50$ dimensions, shown in PCA coordinates and coloured by their true labels. (ii) Distribution of effective neighbourhood sizes (perplexities) for QOT (Frobenius) and EOT (entropic) bistochastic projections, where the mean perplexity is 30 in both cases. (iii) Corresponding affinity matrices. (b) (i-ii) Leading eigenvectors of the graph Laplacian constructed from each affinity matrix. (c) Performance for varying $(d, \varepsilon)$ measured in terms of (i) normalised mutual information (NMI) of spectral clustering result and (ii) eigenspace angle of leading 6 eigenvectors to template subspace. (d) Performance for different numbers of principal components with fixed parameters $\varepsilon$, $k$.
  • Figure 3: MNIST dataset. (a) (i) $N = 750$ MNIST sampled from 1, 2, 7, 9 (250 images each) in $d = 784$ dimensions, shown in PCA coordinates and coloured by their true labels. (ii) Same as in Figure \ref{['fig:gmm']}(a)(ii) but with mean perplexity 13. (iii) Corresponding affinity matrices. (b) Same as in Figure \ref{['fig:gmm']}(b). (c) Performance measured in terms of (i) normalised mutual information (NMI) of spectral clustering result and (ii) eigenspace angle of leading 8 eigenvectors to template subspace. (d) (i-ii) MDS layouts of leading 8 diffusion components and corresponding eigenvectors, for L2 and entropic bistochastic projections. (iii) MDS layout of diffusion components for 1 digits recovers a 1-dimensional manifold corresponding to a rotation for bistochastic L2 projection, while this is corrupted in the entropic case.
  • Figure 4: Single cell RNA sequencing dataset. (a) Sub-sampled dentate gyrus dataset la2018rna (5000 cells, 2239 dimensions) visualised in $t$-SNE coordinates from the original publication, and coloured by celltype annotation. (b) L2 and entropic bistochastic projections of the linear kernel, ordered by celltype annotation. (c) Distribution of effective neighbourhood sizes (perplexities) for L2 and entropic bistochastic projections. (d) Left: Cells coloured by squared distance from origin cell (red) measured in terms of the 10-dimensional spectral embedding, for L2 and entropic bistochastic projections as well as $k$-NN affinity matrix constructions. Right: terminal cell states predicted by la2018rna using RNA velocity estimates. (e) Distribution of Laplacian eigenvalues for L2 and entropic bistochastic projection. Inset shows occurence of leading eigenvalues in the range $[0, 0.07]$. (f) Same as (d) but shown for Granule and CA branches, together with the Pearson correlation between spectral embedding distance to origin and RNA velocity end state probability.
  • Figure 5: Simulated scRNA-seq batch effects. (a) 2000 cells sampled from Granule trajectory with simulated batch effect. (b) Dual potentials from QOT and EOT. (c) Affinity matrices for various constructions, coloured by celltype and batch. (d) $t$-SNE embeddings obtained using different affinity constructions.
  • ...and 6 more figures

Theorems & Definitions (17)

  • Proposition 1
  • Remark 2: Convergence rates
  • Remark 5: No loss of generality in choosing $X_0$
  • Lemma 6
  • Remark 7
  • Remark 8
  • Remark 9
  • Lemma 10
  • Remark 11: Extension to RKHS
  • Theorem 12
  • ...and 7 more