Table of Contents
Fetching ...

Geometry-Aware Optimal Transport: Fast Intrinsic Dimension and Wasserstein Distance Estimation

Ferdinand Genans, Olivier Wintenberger

TL;DR

This work tackles the discretization bottleneck in large-scale Optimal Transport by estimating the discretization error and the intrinsic data dimension directly from samples. It introduces a solver-free Monte Carlo estimator for OT discretization error via the semi-dual, and a linear-time intrinsic-dimension estimator based on multi-scale error decay. The authors then couple entropic-regularization bias and statistical bias with a Diagonal Richardson extrapolation, calibrated by the estimated d_int, and bolster it with a bagging variant to reduce variance. Empirical results on synthetic manifolds and image datasets (e.g., MNIST, FashionMNIST, CIFAR10) demonstrate substantial bias reduction and practical efficiency, enabling geometry-aware Wasserstein estimation and actionable calibration of OT methods for real-world data.

Abstract

Solving large scale Optimal Transport (OT) in machine learning typically relies on sampling measures to obtain a tractable discrete problem. While the discrete solver's accuracy is controllable, the rate of convergence of the discretization error is governed by the intrinsic dimension of our data. Therefore, the true bottleneck is the knowledge and control of the sampling error. In this work, we tackle this issue by introducing novel estimators for both sampling error and intrinsic dimension. The key finding is a simple, tuning-free estimator of $\text{OT}_c(ρ, \hatρ)$ that utilizes the semi-dual OT functional and, remarkably, requires no OT solver. Furthermore, we derive a fast intrinsic dimension estimator from the multi-scale decay of our sampling error estimator. This framework unlocks significant computational and statistical advantages in practice, enabling us to (i) quantify the convergence rate of the discretization error, (ii) calibrate the entropic regularization of Sinkhorn divergences to the data's intrinsic geometry, and (iii) introduce a novel, intrinsic-dimension-based Richardson extrapolation estimator that strongly debiases Wasserstein distance estimation. Numerical experiments demonstrate that our geometry-aware pipeline effectively mitigates the discretization error bottleneck while maintaining computational efficiency.

Geometry-Aware Optimal Transport: Fast Intrinsic Dimension and Wasserstein Distance Estimation

TL;DR

This work tackles the discretization bottleneck in large-scale Optimal Transport by estimating the discretization error and the intrinsic data dimension directly from samples. It introduces a solver-free Monte Carlo estimator for OT discretization error via the semi-dual, and a linear-time intrinsic-dimension estimator based on multi-scale error decay. The authors then couple entropic-regularization bias and statistical bias with a Diagonal Richardson extrapolation, calibrated by the estimated d_int, and bolster it with a bagging variant to reduce variance. Empirical results on synthetic manifolds and image datasets (e.g., MNIST, FashionMNIST, CIFAR10) demonstrate substantial bias reduction and practical efficiency, enabling geometry-aware Wasserstein estimation and actionable calibration of OT methods for real-world data.

Abstract

Solving large scale Optimal Transport (OT) in machine learning typically relies on sampling measures to obtain a tractable discrete problem. While the discrete solver's accuracy is controllable, the rate of convergence of the discretization error is governed by the intrinsic dimension of our data. Therefore, the true bottleneck is the knowledge and control of the sampling error. In this work, we tackle this issue by introducing novel estimators for both sampling error and intrinsic dimension. The key finding is a simple, tuning-free estimator of that utilizes the semi-dual OT functional and, remarkably, requires no OT solver. Furthermore, we derive a fast intrinsic dimension estimator from the multi-scale decay of our sampling error estimator. This framework unlocks significant computational and statistical advantages in practice, enabling us to (i) quantify the convergence rate of the discretization error, (ii) calibrate the entropic regularization of Sinkhorn divergences to the data's intrinsic geometry, and (iii) introduce a novel, intrinsic-dimension-based Richardson extrapolation estimator that strongly debiases Wasserstein distance estimation. Numerical experiments demonstrate that our geometry-aware pipeline effectively mitigates the discretization error bottleneck while maintaining computational efficiency.
Paper Structure (35 sections, 8 theorems, 93 equations, 3 figures, 1 table)

This paper contains 35 sections, 8 theorems, 93 equations, 3 figures, 1 table.

Key Result

Proposition 3.1

Let $c$ be a continuous cost. Given $\rho \in \mathcal{P}(\mathbb{R}^d)$ and a support $X=(x_1,\ldots,x_n)\in(\mathbb{R}^d)^n$, define the weights $\mathbf{w}_n=(w_1,\ldots,w_n)^\top$ by and set $\widehat{\rho}^*_n=\sum_{i=1}^n w_i \delta_{x_i}$. Then $\widehat{\rho}_n^*$ provides the best approximation of $\rho$ supported on $X$, that is, and the minimizer is unique. Moreover, the zero vector $

Figures (3)

  • Figure 1: Intrinsic Dimension Estimation Benchmark. Comparison of our Semi-Discrete $W_1$ estimator against the Discrete $W_1$ baseline and standard geometric estimators. The dashed line represents the ground truth effective dimension $d_{\mathrm{int}} = 10$. Our estimator matches the robustness of discrete OT on mixtures but runs orders of magnitude faster ($<0.1$s vs $\sim 10$s).
  • Figure 2: Sensitivity to Intrinsic Dimension. We perform Diagonal Richardson extrapolation, $2n = 2000$, on a source synthetic mixture in $\mathbb{R}^{10}$ (90% mass on a 5D Gaussian, 10% on a 1D Gaussian) to a 10D Gaussian. By varying the dimension parameter $d$ used in the extrapolation weights, we observe that the estimation error is minimized exactly at the dominant intrinsic dimension $d=5$. This confirms that correctly calibrating the schedule to the effective geometry is essential for optimal debiasing.
  • Figure 3: Variance Reduction via Bagging. Using the same mixture setting as in Figure \ref{['fig:richardson_best_d']}, we measure the variance of the Bagged Richardson estimator as a function of the number of bags $K$. The variance is computed over 20 runs and repeated 10 times. We observe a clear $1/K$ decay in the variance overhead, which converges asymptotically to the variance floor of the standard Sinkhorn estimator $\widehat{S}_{\varepsilon_{2n}, 2n}$ (dashed line). This confirms that bagging eliminates the stability cost of extrapolation.

Theorems & Definitions (15)

  • Proposition 3.1
  • proof
  • Proposition 3.2
  • proof
  • Theorem 4.2
  • Remark 5.1
  • Proposition 5.3
  • Proposition 5.4: Variance Stability
  • Proposition 1.1: Prop.5 and Corollary 1 WeedSharpRates
  • Proposition 1.2: Local Lower Bound for Wasserstein Rates
  • ...and 5 more