Table of Contents
Fetching ...

Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods

Gen Li, Yanxi Chen, Yu Huang, Yuejie Chi, H. Vincent Poor, Yuxin Chen

TL;DR

This work tackles scalable computation of the OT distance by recasting the problem into a bilinear minimax form and solving it with an entropy-regularized extragradient method. The algorithm employs adaptive learning rates and an entropy-augmented objective to achieve fast convergence, yielding a runtime of $\widetilde{O}(n^2/\varepsilon)$ for $\varepsilon$-accurate OT, with $O(n^2)$ memory. Theoretical guarantees are complemented by extensive numerical experiments showing competitive or superior performance to Sinkhorn, Greenkhorn, and recent first-order methods across diverse datasets and cost structures. The approach integrates penalization, minimax reformulation, entropy regularization, and extragradient updates to deliver a practical, provably fast OT solver suitable for large-scale applications with probability distributions of size $n$. The results have potential impact for machine learning pipelines relying on OT distances, including generative modeling, domain adaptation, and distributional analysis, where scalable and accurate transport computations are essential.

Abstract

Efficient computation of the optimal transport distance between two distributions serves as an algorithm subroutine that empowers various applications. This paper develops a scalable first-order optimization-based method that computes optimal transport to within $\varepsilon$ additive accuracy with runtime $\widetilde{O}( n^2/\varepsilon)$, where $n$ denotes the dimension of the probability distributions of interest. Our algorithm achieves the state-of-the-art computational guarantees among all first-order methods, while exhibiting favorable numerical performance compared to classical algorithms like Sinkhorn and Greenkhorn. Underlying our algorithm designs are two key elements: (a) converting the original problem into a bilinear minimax problem over probability distributions; (b) exploiting the extragradient idea -- in conjunction with entropy regularization and adaptive learning rates -- to accelerate convergence.

Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods

TL;DR

This work tackles scalable computation of the OT distance by recasting the problem into a bilinear minimax form and solving it with an entropy-regularized extragradient method. The algorithm employs adaptive learning rates and an entropy-augmented objective to achieve fast convergence, yielding a runtime of for -accurate OT, with memory. Theoretical guarantees are complemented by extensive numerical experiments showing competitive or superior performance to Sinkhorn, Greenkhorn, and recent first-order methods across diverse datasets and cost structures. The approach integrates penalization, minimax reformulation, entropy regularization, and extragradient updates to deliver a practical, provably fast OT solver suitable for large-scale applications with probability distributions of size . The results have potential impact for machine learning pipelines relying on OT distances, including generative modeling, domain adaptation, and distributional analysis, where scalable and accurate transport computations are essential.

Abstract

Efficient computation of the optimal transport distance between two distributions serves as an algorithm subroutine that empowers various applications. This paper develops a scalable first-order optimization-based method that computes optimal transport to within additive accuracy with runtime , where denotes the dimension of the probability distributions of interest. Our algorithm achieves the state-of-the-art computational guarantees among all first-order methods, while exhibiting favorable numerical performance compared to classical algorithms like Sinkhorn and Greenkhorn. Underlying our algorithm designs are two key elements: (a) converting the original problem into a bilinear minimax problem over probability distributions; (b) exploiting the extragradient idea -- in conjunction with entropy regularization and adaptive learning rates -- to accelerate convergence.
Paper Structure (35 sections, 2 theorems, 86 equations, 5 figures, 1 table, 3 algorithms)

This paper contains 35 sections, 2 theorems, 86 equations, 5 figures, 1 table, 3 algorithms.

Key Result

Lemma 1

For any non-negative matrix $\widehat{\bm{P}} \in \mathbb{R}^{n\times n}_+$, there exists a fast algorithm (see altschuler2017near) that is able to find a probability matrix $\widetilde{\bm{P}}\in \Delta_{n\times n}$ with $O(n^2)$ computation complexity such that

Figures (5)

  • Figure 1: Empirical comparisons of various algorithms under different settings. Each curve is an average over 10 independent trials. The first and third rows use the number of matrix-vector products as a metric of computational complexities, while the second and fourth use the actual runtime. Here, we take $C_1=C_2=1$, and the Y-axis represents the sub-optimality gap $\langle \bm{W}, \bm{P} \rangle - \langle \bm{W}, \bm{P}^{\star} \rangle$.
  • Figure 2: Empirical comparisons of algorithms using the 2-Wasserstein distance.
  • Figure 3: The proposed extragradient method (Algorithm \ref{['alg:main']}) with adjustment step vs. the version without adjustment. The problem settings are the same as those in Figure \ref{['fig:compare_alg']}.
  • Figure 4: Empirical comparisons between our extragradient method and two recently proposed algorithms, namely the dual extrapolation method jambulapati2019direct and the DROT method mai2022a. Each curve is an average over 10 independent trials.
  • Figure 5: Empirical comparisons of our extragradient method and two cost-free acceleration strategies, namely, the overrelaxation method Lehmann2021note and the batching Greenkhorn method kostic2022batch. Each curve represents an average over 10 independent trials.

Theorems & Definitions (3)

  • Lemma 1
  • Theorem 1
  • Remark 1