Table of Contents
Fetching ...

Linear-Space Extragradient Methods for Fast, Large-Scale Optimal Transport

Matthew X. Burns, Jiaming Liang

TL;DR

This work tackles large-scale optimal transport under stringent memory constraints by introducing DXG, a dual-only extragradient method that operates in $O(n)$ memory while achieving near-state-of-the-art iteration complexity for OT. By establishing a dual-structured saddle-point formulation and proving equivalences with entropy-regularized OT, the authors extend the approach to Wasserstein barycenters and provide a CUDA-accelerated implementation. They show how the dual framework can recover primal iterates with linear-space storage and present convergence guarantees with non-asymptotic iteration complexity bounds, including a general parameter regime yielding $\mathcal{O}(\sqrt{n}\,\eta^{-1}\log\frac{n}{\varepsilon})$ iterations. Empirical results demonstrate DXG’s strong performance in weakly regularized EOT and $\ell_1$-cost settings, while also identifying limitations in barycenter tasks and outlining future directions for improvement.

Abstract

Optimal transport (OT) and its entropy-regularized form (EOT) have become increasingly prominent computational problems, with applications in machine learning and statistics. Recent years have seen a commensurate surge in first-order methods aiming to improve the complexity of large-scale (E)OT. However, there has been a consistent tradeoff: attaining state-of-the-art rates requires $\mathcal{O}(n^2)$ storage to enable ergodic primal averaging. In this work, we demonstrate that recently proposed primal-dual extragradient methods (PDXG) can be implemented entirely in the dual with $\mathcal{O}(n)$ storage. Additionally, we prove that regularizing the reformulated OT problem is equivalent to EOT with extensions to entropy-regularized barycenter problems, further widening the applications of the proposed method. The proposed dual-only extragradient method (DXG) achieves $\mathcal{O}(n^2\varepsilon^{-1})$ complexity for $\varepsilon$-approximate OT with $\mathcal{O}(n)$ memory. Numerical experiments demonstrate that the dual extragradient method scales favorably in non/weakly-regularized regimes compared to existing algorithms, though future work is needed to improve performance in certain problem classes.

Linear-Space Extragradient Methods for Fast, Large-Scale Optimal Transport

TL;DR

This work tackles large-scale optimal transport under stringent memory constraints by introducing DXG, a dual-only extragradient method that operates in memory while achieving near-state-of-the-art iteration complexity for OT. By establishing a dual-structured saddle-point formulation and proving equivalences with entropy-regularized OT, the authors extend the approach to Wasserstein barycenters and provide a CUDA-accelerated implementation. They show how the dual framework can recover primal iterates with linear-space storage and present convergence guarantees with non-asymptotic iteration complexity bounds, including a general parameter regime yielding iterations. Empirical results demonstrate DXG’s strong performance in weakly regularized EOT and -cost settings, while also identifying limitations in barycenter tasks and outlining future directions for improvement.

Abstract

Optimal transport (OT) and its entropy-regularized form (EOT) have become increasingly prominent computational problems, with applications in machine learning and statistics. Recent years have seen a commensurate surge in first-order methods aiming to improve the complexity of large-scale (E)OT. However, there has been a consistent tradeoff: attaining state-of-the-art rates requires storage to enable ergodic primal averaging. In this work, we demonstrate that recently proposed primal-dual extragradient methods (PDXG) can be implemented entirely in the dual with storage. Additionally, we prove that regularizing the reformulated OT problem is equivalent to EOT with extensions to entropy-regularized barycenter problems, further widening the applications of the proposed method. The proposed dual-only extragradient method (DXG) achieves complexity for -approximate OT with memory. Numerical experiments demonstrate that the dual extragradient method scales favorably in non/weakly-regularized regimes compared to existing algorithms, though future work is needed to improve performance in certain problem classes.

Paper Structure

This paper contains 26 sections, 26 theorems, 200 equations, 6 figures, 1 table, 4 algorithms.

Key Result

Lemma 1

If $r$, $c\in\Delta^{n}$ and $\pi\in\mathbb{R}_{+}^{n\times n}$, then Algorithm alg:round takes $\mathcal{O}(n^2)$ time to output a matrix $\tilde{\pi}\in \Pi(r,c)$ satisfying

Figures (6)

  • Figure 1: EOT objective gap with varying regularization strength $\eta$ on $N=1024$ instances from the DOTmark repository.
  • Figure 2: EOT objective gap with varying regularization strength $\eta$ on $N=1024$ instances from the DOTmark repository.
  • Figure 3: EOT objective gap with varying problem size $N$ on a sample DOTmark problem with varying distance measures.
  • Figure 4: [Top] Infeasibility and function value comparisons in color transfer tasks. [Bottom] Visual comparison of $512\times 512$ color transfer results for Sinkhorn (left) and DXG (right) with a 4 hour timeout. Note that Sinkhorn exhibits significantly more visual artifacts than the DXG transfer owing to high plan infeasibility.
  • Figure 5:
  • ...and 1 more figures

Theorems & Definitions (48)

  • Lemma 1: altschulerNearlinearTimeApproximation2017
  • Lemma 2
  • proof
  • Lemma 3: peyre2019computational
  • Lemma 4
  • proof
  • Lemma 5
  • proof
  • Proposition 1
  • Lemma 6: nocedalNumericalOptimization2006
  • ...and 38 more