Annealed Sinkhorn for Optimal Transport: convergence, regularization path and debiasing
Lénaïc Chizat
TL;DR
This paper establishes theoretical convergence guarantees for Annealed Sinkhorn under practical concave annealing schedules, showing that OT is recovered if $\beta_t\to\infty$ and $\beta_t-\beta_{t-1}\to 0$, via an online mirror descent viewpoint. It introduces the regularization path as a tractable proxy, revealing an entropic error of $O(\beta_t^{-1})$ and a relaxation error of $O(\beta_t-\beta_{t-1})$, with the best universal rate achieved at $\beta_t=\Theta(t^{1/2})$. To overcome the relaxation bias, the paper proposes Debiased Annealed Sinkhorn, which leverages an adaptive reweighting of the marginal $p$ to reduce first-order relaxation effects and enable faster annealing (empirically approaching the speed–accuracy Pareto front). Extensions to Symmetric Sinkhorn show analogous interpretations and highlight the potential for unbalanced OT connections. Overall, the results provide practical guidance for using annealing to solve OT more efficiently and motivate further debiasing and multiscale applications.
Abstract
Sinkhorn's algorithm is a method of choice to solve large-scale optimal transport (OT) problems. In this context, it involves an inverse temperature parameter $β$ that determines the speed-accuracy trade-off. To improve this trade-off, practitioners often use a variant of this algorithm, Annealed Sinkhorn, that uses an nondecreasing sequence $(β_t)_{t\in \mathbb{N}}$ where $t$ is the iteration count. However, besides for the schedule $β_t=Θ(\log t)$ which is impractically slow, it is not known whether this variant is guaranteed to actually solve OT. Our first contribution answers this question: we show that a concave annealing schedule asymptotically solves OT if and only if $β_t\to+\infty$ and $β_t-β_{t-1}\to 0$. The proof is based on an equivalence with Online Mirror Descent and further suggests that the iterates of Annealed Sinkhorn follow the solutions of a sequence of relaxed, entropic OT problems, the regularization path. An analysis of this path reveals that, in addition to the well-known "entropic" error in $Θ(β^{-1}_t)$, the annealing procedure induces a "relaxation" error in $Θ(β_{t}-β_{t-1})$. The best error trade-off is achieved with the schedule $β_t = Θ(\sqrt{t})$ which, albeit slow, is a universal limitation of this method. Going beyond this limitation, we propose a simple modification of Annealed Sinkhorn that reduces the relaxation error, and therefore enables faster annealing schedules. In toy experiments, we observe the effectiveness of our Debiased Annealed Sinkhorn's algorithm: a single run of this algorithm spans the whole speed-accuracy Pareto front of the standard Sinkhorn's algorithm.
