Fast and Large-Scale Unbalanced Optimal Transport via its Semi-Dual and Adaptive Gradient Methods
Ferdinand Genans
TL;DR
This work tackles scalable optimization for Unbalanced OT (UOT) by analyzing the entropic semi-dual. It shows that the local geometry near the optimizer has a condition number scaling as O(1/ε), independent of the problem size n, which enables adaptive first-order methods. The authors develop PASGD for semi-discrete problems with convergence at O(n/(ε T)) and design ANAG for the discrete full-batch setting with a near-optimal local complexity of O(n^2 sqrt(1/ε) log(1/δ)). They provide a rigorous treatment of global and local curvature, generalized self-concordance, and data-dependent smoothness, along with extensive numerical demonstrations in color transfer and semi-discrete tasks. Overall, the paper delivers scalable, theory-backed solvers for large-scale UOT while highlighting the practical benefits of using a χ^2 target divergence over KL in the semi-dual.
Abstract
Unbalanced Optimal Transport (UOT) has emerged as a robust relaxation of standard Optimal Transport, particularly effective for handling outliers and mass variations. However, scalable algorithms for UOT, specifically those based on Gradient Descent (SGD), remain largely underexplored. In this work, we address this gap by analyzing the semi-dual formulation of Entropic UOT and demonstrating its suitability for adaptive gradient methods. While the semi-dual is a standard tool for large-scale balanced OT, its geometry in the unbalanced setting appears ill-conditioned under standard analysis. Specifically, worst-case bounds on the marginal penalties using $χ^2$ divergence suggest a condition number scaling with $n/\varepsilon$, implying poor scalability. In contrast, we show that the local condition number actually scales as $\mathcal{O}(1/\varepsilon)$, effectively removing the ill-conditioned dependence on $n$. Exploiting this property, we prove that SGD methods adapt to this local curvature, achieving a convergence rate of $\mathcal{O}(n/\varepsilon T)$ in the stochastic and online regimes, making it suitable for large-scale and semi-discrete applications. Finally, for the full batch discrete setting, we derive a nearly tight upper bound on local smoothness depending solely on the gradient. Using it to adapt step sizes, we propose a modified Adaptive Nesterov Accelerated Gradient (ANAG) method on the semi-dual functional and prove that it achieves a local complexity of $\mathcal{O}(n^2\sqrt{1/\varepsilon}\ln(1/δ))$.
