Table of Contents
Fetching ...

Neural Entropic Optimal Transport and Gromov-Wasserstein Alignment

Tao Wang, Ziv Goldfeld

TL;DR

The paper addresses the high statistical and computational costs of optimal transport (OT) and Gromov-Wasserstein (GW) matching by replacing the Sinkhorn step with a neural estimator that learns dual potentials on mini-batches, enabling scalable estimation of entropic OT (EOT) and entropic GW (EGW). It develops a neural estimation framework (NE) for EOT and extends it to EGW via the EGW variational representation, providing non-asymptotic, minimax-optimal error bounds that scale as $O( ext{poly}(1/oldsymbol{bcepsilon})(k^{-1/2}+n^{-1/2}))$ when the NN width $k$ matches the sample size $n$. The approach yields not only cost estimates but also the corresponding transport/alignment plans through the Gibbs density picture, with guarantees on both cost accuracy and plan approximation. Empirical results on synthetic data and MNIST demonstrate scalability to high dimensions and large samples, validating the theory and illustrating practical applicability to real-world large-scale OT/GW tasks.

Abstract

Optimal transport (OT) and Gromov-Wasserstein (GW) alignment are powerful frameworks for geometrically driven matching of probability distributions, yet their large-scale usage is hampered by high statistical and computational costs. Entropic regularization has emerged as a promising solution, allowing parametric convergence rates via the plug-in estimator, which can be computed using the Sinkhorn algorithm (or its iterations in the GW case). However, Sinkhorn's $O(n^2)$ time complexity for an $n$-sized dataset becomes prohibitive for modern, massive datasets. In this work, we propose a new computational framework for the entropic OT and GW problems that replaces the Sinkhorn step with a neural network trained via backpropagation on mini-batches. By shifting the computational load from the entire dataset to the mini-batch, our approach enables reliable estimation of both the optimal transport/alignment cost and plan at dataset sizes and dimensions far exceeding those tractable with standard Sinkhorn methods. We derive non-asymptotic error bounds for these estimates, showing they achieve minimax-optimal parametric convergence rates for compactly supported distributions. Numerical experiments confirm the accuracy of our method in high-dimensional, large-sample regimes where Sinkhorn is infeasible.

Neural Entropic Optimal Transport and Gromov-Wasserstein Alignment

TL;DR

The paper addresses the high statistical and computational costs of optimal transport (OT) and Gromov-Wasserstein (GW) matching by replacing the Sinkhorn step with a neural estimator that learns dual potentials on mini-batches, enabling scalable estimation of entropic OT (EOT) and entropic GW (EGW). It develops a neural estimation framework (NE) for EOT and extends it to EGW via the EGW variational representation, providing non-asymptotic, minimax-optimal error bounds that scale as when the NN width matches the sample size . The approach yields not only cost estimates but also the corresponding transport/alignment plans through the Gibbs density picture, with guarantees on both cost accuracy and plan approximation. Empirical results on synthetic data and MNIST demonstrate scalability to high dimensions and large samples, validating the theory and illustrating practical applicability to real-world large-scale OT/GW tasks.

Abstract

Optimal transport (OT) and Gromov-Wasserstein (GW) alignment are powerful frameworks for geometrically driven matching of probability distributions, yet their large-scale usage is hampered by high statistical and computational costs. Entropic regularization has emerged as a promising solution, allowing parametric convergence rates via the plug-in estimator, which can be computed using the Sinkhorn algorithm (or its iterations in the GW case). However, Sinkhorn's time complexity for an -sized dataset becomes prohibitive for modern, massive datasets. In this work, we propose a new computational framework for the entropic OT and GW problems that replaces the Sinkhorn step with a neural network trained via backpropagation on mini-batches. By shifting the computational load from the entire dataset to the mini-batch, our approach enables reliable estimation of both the optimal transport/alignment cost and plan at dataset sizes and dimensions far exceeding those tractable with standard Sinkhorn methods. We derive non-asymptotic error bounds for these estimates, showing they achieve minimax-optimal parametric convergence rates for compactly supported distributions. Numerical experiments confirm the accuracy of our method in high-dimensional, large-sample regimes where Sinkhorn is infeasible.
Paper Structure (29 sections, 16 theorems, 115 equations, 3 figures, 1 algorithm)

This paper contains 29 sections, 16 theorems, 115 equations, 3 figures, 1 algorithm.

Key Result

Lemma 1

Fix $\varepsilon >0$, $(\mu,\nu)\in\mathcal{P}_4(\mathbb{R}^{d_x})\times\mathcal{P}_4(\mathbb{R}^{d_y})$ with zero mean, and let $M_{\mu,\nu}\coloneqq \sqrt{M_2(\mu)M_2(\nu)}$. Then, where $\mathsf{OT}_\mathbf{A} ^{\mspace{1mu}\varepsilon }$ is the EOT problem with cost $c_{\mathbf{A}}:(x,y)\in\mathbb{R}^{d_x}\times \mathbb{R}^{d_y}\mapsto-4\|x\|^2\|y\|^2-32x^{\intercal}\mathbf{A} y$. Moreover,

Figures (3)

  • Figure 1: Illustration of optimal plan for the OT and GW problems between $\mu$ and $\nu$, with $\Pi(\mu,\nu)$ designating the set of all their couplings.
  • Figure 2: Neural Estimation of EGW alignment: (a) Relative error for the case where $\mu=\nu=\mathrm{Unif}([-1/\sqrt{d},1/\sqrt{d}]^d)$; (b) Relative error for $\mu,\nu$ as centered Gaussian distributions with randomly generated covariance matrices; (c) Learned neural alignment plan (in red) versus the true optimal GW alignment (whose density is represented by the back contour lines).
  • Figure 3: Neural Estimation of EIGW on MNIST: (a) Testing for the orthogonal invariance of the EIGW distance by estimating the gap $|\mathsf{IGW}^\varepsilon (\mu,\mu)-\mathsf{IGW}^\varepsilon (\mathbf{U}_{\sharp}\mu,\mathbf{V}_{\sharp}\mu)|$, for $\mu$ as the empirical MNIST distribution and $(\mathbf{U},\mathbf{V})$ two orthogonal matrices; (b) Capturing visual similarities between digits by estimating the EIGW distance between different MNIST digits.

Theorems & Definitions (26)

  • Lemma 1: EGW duality; Theorem 1 in zhang2024gromov
  • Remark 1: Inner product cost
  • Theorem 1: EOT cost neural estimation; bound 1
  • proof
  • Lemma 2: Approximation error bound
  • Lemma 3: Estimation error
  • Remark 2: Minimax optimality
  • Remark 3: Almost explicit expression for $C$
  • Theorem 2: EOT cost neural estimation; bound 2
  • Theorem 3: EOT plan neural estimation
  • ...and 16 more