Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods
Gen Li, Yanxi Chen, Yu Huang, Yuejie Chi, H. Vincent Poor, Yuxin Chen
TL;DR
This work tackles scalable computation of the OT distance by recasting the problem into a bilinear minimax form and solving it with an entropy-regularized extragradient method. The algorithm employs adaptive learning rates and an entropy-augmented objective to achieve fast convergence, yielding a runtime of $\widetilde{O}(n^2/\varepsilon)$ for $\varepsilon$-accurate OT, with $O(n^2)$ memory. Theoretical guarantees are complemented by extensive numerical experiments showing competitive or superior performance to Sinkhorn, Greenkhorn, and recent first-order methods across diverse datasets and cost structures. The approach integrates penalization, minimax reformulation, entropy regularization, and extragradient updates to deliver a practical, provably fast OT solver suitable for large-scale applications with probability distributions of size $n$. The results have potential impact for machine learning pipelines relying on OT distances, including generative modeling, domain adaptation, and distributional analysis, where scalable and accurate transport computations are essential.
Abstract
Efficient computation of the optimal transport distance between two distributions serves as an algorithm subroutine that empowers various applications. This paper develops a scalable first-order optimization-based method that computes optimal transport to within $\varepsilon$ additive accuracy with runtime $\widetilde{O}( n^2/\varepsilon)$, where $n$ denotes the dimension of the probability distributions of interest. Our algorithm achieves the state-of-the-art computational guarantees among all first-order methods, while exhibiting favorable numerical performance compared to classical algorithms like Sinkhorn and Greenkhorn. Underlying our algorithm designs are two key elements: (a) converting the original problem into a bilinear minimax problem over probability distributions; (b) exploiting the extragradient idea -- in conjunction with entropy regularization and adaptive learning rates -- to accelerate convergence.
