Linear-Space Extragradient Methods for Fast, Large-Scale Optimal Transport
Matthew X. Burns, Jiaming Liang
TL;DR
This work tackles large-scale optimal transport under stringent memory constraints by introducing DXG, a dual-only extragradient method that operates in $O(n)$ memory while achieving near-state-of-the-art iteration complexity for OT. By establishing a dual-structured saddle-point formulation and proving equivalences with entropy-regularized OT, the authors extend the approach to Wasserstein barycenters and provide a CUDA-accelerated implementation. They show how the dual framework can recover primal iterates with linear-space storage and present convergence guarantees with non-asymptotic iteration complexity bounds, including a general parameter regime yielding $\mathcal{O}(\sqrt{n}\,\eta^{-1}\log\frac{n}{\varepsilon})$ iterations. Empirical results demonstrate DXG’s strong performance in weakly regularized EOT and $\ell_1$-cost settings, while also identifying limitations in barycenter tasks and outlining future directions for improvement.
Abstract
Optimal transport (OT) and its entropy-regularized form (EOT) have become increasingly prominent computational problems, with applications in machine learning and statistics. Recent years have seen a commensurate surge in first-order methods aiming to improve the complexity of large-scale (E)OT. However, there has been a consistent tradeoff: attaining state-of-the-art rates requires $\mathcal{O}(n^2)$ storage to enable ergodic primal averaging. In this work, we demonstrate that recently proposed primal-dual extragradient methods (PDXG) can be implemented entirely in the dual with $\mathcal{O}(n)$ storage. Additionally, we prove that regularizing the reformulated OT problem is equivalent to EOT with extensions to entropy-regularized barycenter problems, further widening the applications of the proposed method. The proposed dual-only extragradient method (DXG) achieves $\mathcal{O}(n^2\varepsilon^{-1})$ complexity for $\varepsilon$-approximate OT with $\mathcal{O}(n)$ memory. Numerical experiments demonstrate that the dual extragradient method scales favorably in non/weakly-regularized regimes compared to existing algorithms, though future work is needed to improve performance in certain problem classes.
