Neural Entropic Optimal Transport and Gromov-Wasserstein Alignment
Tao Wang, Ziv Goldfeld
TL;DR
The paper addresses the high statistical and computational costs of optimal transport (OT) and Gromov-Wasserstein (GW) matching by replacing the Sinkhorn step with a neural estimator that learns dual potentials on mini-batches, enabling scalable estimation of entropic OT (EOT) and entropic GW (EGW). It develops a neural estimation framework (NE) for EOT and extends it to EGW via the EGW variational representation, providing non-asymptotic, minimax-optimal error bounds that scale as $O( ext{poly}(1/oldsymbol{bcepsilon})(k^{-1/2}+n^{-1/2}))$ when the NN width $k$ matches the sample size $n$. The approach yields not only cost estimates but also the corresponding transport/alignment plans through the Gibbs density picture, with guarantees on both cost accuracy and plan approximation. Empirical results on synthetic data and MNIST demonstrate scalability to high dimensions and large samples, validating the theory and illustrating practical applicability to real-world large-scale OT/GW tasks.
Abstract
Optimal transport (OT) and Gromov-Wasserstein (GW) alignment are powerful frameworks for geometrically driven matching of probability distributions, yet their large-scale usage is hampered by high statistical and computational costs. Entropic regularization has emerged as a promising solution, allowing parametric convergence rates via the plug-in estimator, which can be computed using the Sinkhorn algorithm (or its iterations in the GW case). However, Sinkhorn's $O(n^2)$ time complexity for an $n$-sized dataset becomes prohibitive for modern, massive datasets. In this work, we propose a new computational framework for the entropic OT and GW problems that replaces the Sinkhorn step with a neural network trained via backpropagation on mini-batches. By shifting the computational load from the entire dataset to the mini-batch, our approach enables reliable estimation of both the optimal transport/alignment cost and plan at dataset sizes and dimensions far exceeding those tractable with standard Sinkhorn methods. We derive non-asymptotic error bounds for these estimates, showing they achieve minimax-optimal parametric convergence rates for compactly supported distributions. Numerical experiments confirm the accuracy of our method in high-dimensional, large-sample regimes where Sinkhorn is infeasible.
