Table of Contents
Fetching ...

Transport Clustering: Solving Low-Rank Optimal Transport via Clustering

Henri Schmidt, Peter Halmos, Ben Raphael

TL;DR

Transport clustering is introduced, an algorithm to compute a low-rank OT plan that reduces low-rank OT to a clustering problem on correspondences obtained from a full-rank $\textit{transport registration}$ step, and it is proved that this reduction yields polynomial-time, constant-factor approximation algorithms for low-rank OT.

Abstract

Optimal transport (OT) finds a least cost transport plan between two probability distributions using a cost matrix defined on pairs of points. Unlike standard OT, which infers unstructured pointwise mappings, low-rank optimal transport explicitly constrains the rank of the transport plan to infer latent structure. This improves statistical stability and robustness, yields sharper parametric rates for estimating Wasserstein distances adaptive to the intrinsic rank, and generalizes $K$-means to co-clustering. These advantages, however, come at the cost of a non-convex and NP-hard optimization problem. We introduce transport clustering, an algorithm to compute a low-rank OT plan that reduces low-rank OT to a clustering problem on correspondences obtained from a full-rank $\textit{transport registration}$ step. We prove that this reduction yields polynomial-time, constant-factor approximation algorithms for low-rank OT: specifically, a $(1+γ)$ approximation for negative-type metrics and a $(1+γ+\sqrt{2γ}\,)$ approximation for kernel costs, where $γ\in [0,1]$ denotes the approximation ratio of the optimal full-rank solution relative to the low-rank optimal. Empirically, transport clustering outperforms existing low-rank OT solvers on synthetic benchmarks and large-scale, high-dimensional datasets.

Transport Clustering: Solving Low-Rank Optimal Transport via Clustering

TL;DR

Transport clustering is introduced, an algorithm to compute a low-rank OT plan that reduces low-rank OT to a clustering problem on correspondences obtained from a full-rank step, and it is proved that this reduction yields polynomial-time, constant-factor approximation algorithms for low-rank OT.

Abstract

Optimal transport (OT) finds a least cost transport plan between two probability distributions using a cost matrix defined on pairs of points. Unlike standard OT, which infers unstructured pointwise mappings, low-rank optimal transport explicitly constrains the rank of the transport plan to infer latent structure. This improves statistical stability and robustness, yields sharper parametric rates for estimating Wasserstein distances adaptive to the intrinsic rank, and generalizes -means to co-clustering. These advantages, however, come at the cost of a non-convex and NP-hard optimization problem. We introduce transport clustering, an algorithm to compute a low-rank OT plan that reduces low-rank OT to a clustering problem on correspondences obtained from a full-rank step. We prove that this reduction yields polynomial-time, constant-factor approximation algorithms for low-rank OT: specifically, a approximation for negative-type metrics and a approximation for kernel costs, where denotes the approximation ratio of the optimal full-rank solution relative to the low-rank optimal. Empirically, transport clustering outperforms existing low-rank OT solvers on synthetic benchmarks and large-scale, high-dimensional datasets.
Paper Structure (23 sections, 15 theorems, 115 equations, 10 figures, 6 tables, 2 algorithms)

This paper contains 23 sections, 15 theorems, 115 equations, 10 figures, 6 tables, 2 algorithms.

Key Result

Theorem 4.1

Let $\mathbf{C} \in \mathbb{R}^{n \times n}$ be a cost matrix either induced by i) a metric of negative type, ii) a kernel cost, or iii) a cost satisfying the triangle inequality. If $\mathbf{P}_{\sigma^{\star}}$ denotes the full-rank optimal transport plan for $\mathbf{C}$ and $\tilde{\mathbf{C}} = where $\gamma \in [0,1]$ is the ratio of the cost of the optimal rank $n$ and $K$ solutions and $\r

Figures (10)

  • Figure 1: TC on (a) a synthetic 2-Moons ($X$) and 8-Gaussians ($Y$) dataset ($n=m = 1024$) from tong2023improving with the (b) Monge map alignment of $X$ and $Y=\sigma(X)$ using halmos2025hierarchical. TC reduces low-rank OT (co-clustering) to (c) clustering a single set of Monge registered correspondences using generalized $K$-means.
  • Figure 2: The relative cost of the rank $K \in \{50, 75, \ldots, 250\}$ transport plan inferred LOT, FRLC, and LatentOT compared to the cost of the transport plan inferred by TC across $315$ synthetic instances (lower is better). Each dataset contains $n = m = 5000$ data points. LatentOT is excluded from the stochastic block model evaluation as it takes as input a squared Euclidean cost matrix.
  • Figure 3: Estimation of squared Wasserstein distance on the fractured hypercube of forrow19a. Convergence shown for fixed $d=30$, $K=10$, and averaged over $10$ runs.
  • Figure 4: Geometric constructions providing lower bounds for Theorem \ref{['thm:reduction_to_kcut']} in the case of (left) Euclidean cost ($k$ = 3) and (right) squared Euclidean cost ($k$ = 2). Points in ${X}$ are colored black and points in ${Y}$ are colored white. Points connected by a line segment have identical coordinates and are separated for ease of visualization.
  • Figure 5: Comparison of low-rank OT methods on the stochastic block model dataset. (Left) Relative cost of the rank $K \in \{10, \ldots, 100\}$ transport plan inferred by LOT and FRLC compared to the cost of the transport plan inferred by TC. (Right) Co-clustering accuracy (AMI/ARI) of TC, LOT, and FRLC at rank $K = 100$. The stochastic block model dataset consists of $100$ clusters of size $50$.
  • ...and 5 more figures

Theorems & Definitions (34)

  • Definition 2.1
  • Definition 3.1
  • Theorem 4.1
  • Proposition 4.2
  • Theorem 4.3
  • Lemma 1.1
  • Lemma 1.2
  • proof
  • Lemma 1.2
  • proof
  • ...and 24 more