Annealed Sinkhorn for Optimal Transport: convergence, regularization path and debiasing

Lénaïc Chizat

Annealed Sinkhorn for Optimal Transport: convergence, regularization path and debiasing

Lénaïc Chizat

TL;DR

This paper establishes theoretical convergence guarantees for Annealed Sinkhorn under practical concave annealing schedules, showing that OT is recovered if $\beta_t\to\infty$ and $\beta_t-\beta_{t-1}\to 0$, via an online mirror descent viewpoint. It introduces the regularization path as a tractable proxy, revealing an entropic error of $O(\beta_t^{-1})$ and a relaxation error of $O(\beta_t-\beta_{t-1})$, with the best universal rate achieved at $\beta_t=\Theta(t^{1/2})$. To overcome the relaxation bias, the paper proposes Debiased Annealed Sinkhorn, which leverages an adaptive reweighting of the marginal $p$ to reduce first-order relaxation effects and enable faster annealing (empirically approaching the speed–accuracy Pareto front). Extensions to Symmetric Sinkhorn show analogous interpretations and highlight the potential for unbalanced OT connections. Overall, the results provide practical guidance for using annealing to solve OT more efficiently and motivate further debiasing and multiscale applications.

Abstract

Sinkhorn's algorithm is a method of choice to solve large-scale optimal transport (OT) problems. In this context, it involves an inverse temperature parameter $β$ that determines the speed-accuracy trade-off. To improve this trade-off, practitioners often use a variant of this algorithm, Annealed Sinkhorn, that uses an nondecreasing sequence $(β_t)_{t\in \mathbb{N}}$ where $t$ is the iteration count. However, besides for the schedule $β_t=Θ(\log t)$ which is impractically slow, it is not known whether this variant is guaranteed to actually solve OT. Our first contribution answers this question: we show that a concave annealing schedule asymptotically solves OT if and only if $β_t\to+\infty$ and $β_t-β_{t-1}\to 0$. The proof is based on an equivalence with Online Mirror Descent and further suggests that the iterates of Annealed Sinkhorn follow the solutions of a sequence of relaxed, entropic OT problems, the regularization path. An analysis of this path reveals that, in addition to the well-known "entropic" error in $Θ(β^{-1}_t)$, the annealing procedure induces a "relaxation" error in $Θ(β_{t}-β_{t-1})$. The best error trade-off is achieved with the schedule $β_t = Θ(\sqrt{t})$ which, albeit slow, is a universal limitation of this method. Going beyond this limitation, we propose a simple modification of Annealed Sinkhorn that reduces the relaxation error, and therefore enables faster annealing schedules. In toy experiments, we observe the effectiveness of our Debiased Annealed Sinkhorn's algorithm: a single run of this algorithm spans the whole speed-accuracy Pareto front of the standard Sinkhorn's algorithm.

Annealed Sinkhorn for Optimal Transport: convergence, regularization path and debiasing

TL;DR

This paper establishes theoretical convergence guarantees for Annealed Sinkhorn under practical concave annealing schedules, showing that OT is recovered if

and

, via an online mirror descent viewpoint. It introduces the regularization path as a tractable proxy, revealing an entropic error of

and a relaxation error of

, with the best universal rate achieved at

. To overcome the relaxation bias, the paper proposes Debiased Annealed Sinkhorn, which leverages an adaptive reweighting of the marginal

to reduce first-order relaxation effects and enable faster annealing (empirically approaching the speed–accuracy Pareto front). Extensions to Symmetric Sinkhorn show analogous interpretations and highlight the potential for unbalanced OT connections. Overall, the results provide practical guidance for using annealing to solve OT more efficiently and motivate further debiasing and multiscale applications.

Abstract

Sinkhorn's algorithm is a method of choice to solve large-scale optimal transport (OT) problems. In this context, it involves an inverse temperature parameter

that determines the speed-accuracy trade-off. To improve this trade-off, practitioners often use a variant of this algorithm, Annealed Sinkhorn, that uses an nondecreasing sequence

where

is the iteration count. However, besides for the schedule

which is impractically slow, it is not known whether this variant is guaranteed to actually solve OT. Our first contribution answers this question: we show that a concave annealing schedule asymptotically solves OT if and only if

and

. The proof is based on an equivalence with Online Mirror Descent and further suggests that the iterates of Annealed Sinkhorn follow the solutions of a sequence of relaxed, entropic OT problems, the regularization path. An analysis of this path reveals that, in addition to the well-known "entropic" error in

, the annealing procedure induces a "relaxation" error in

. The best error trade-off is achieved with the schedule

which, albeit slow, is a universal limitation of this method. Going beyond this limitation, we propose a simple modification of Annealed Sinkhorn that reduces the relaxation error, and therefore enables faster annealing schedules. In toy experiments, we observe the effectiveness of our Debiased Annealed Sinkhorn's algorithm: a single run of this algorithm spans the whole speed-accuracy Pareto front of the standard Sinkhorn's algorithm.

Paper Structure (25 sections, 12 theorems, 61 equations, 4 figures, 5 algorithms)

This paper contains 25 sections, 12 theorems, 61 equations, 4 figures, 5 algorithms.

Introduction
Sinkhorn's algorithm
Annealed Sinkhorn
Contributions
Notation
Convergence of Annealed Sinkhorn
The qualitative picture
Equivalence with online mirror descent
A discussion on quantitative guarantees
Quantifying progress towards OT
Complexity of OT via standard Sinkhorn
Complexity of OT via Annealed Sinkhorn
The regularization path
Proxies for the optimization path
The Online Path
...and 10 more sections

Key Result

Theorem 2.1

Let $(\pi_t)$ be the sequence generated by Annealed Sinkhorn (Alg. alg:annealed-sinkhorn) with a positive, nondecreasing and concave annealing schedule $(\beta_t)$, that is such that its difference sequence $\alpha_t = \beta_{t}-\beta_{t-1}$ in nonnegative and nonincreasing. Let $\lim \beta_t =\beta In particular, $\pi_\infty$ is an optimal transport plan if and only if $\beta_\infty=+\infty$ and

Figures (4)

Figure 1: Comparison of Sinkhorn's algorithm and its annealed variants for their respective optimal annealing schedules of the form $\beta_t=\beta_0 (1+t)^{\kappa}$ (here $\beta_0=(10/\Vert c\Vert_\mathrm{osc})$). We plot the OT suboptimality after projecting $\pi_t$ on $\Gamma(p,q)$ via Alg. \ref{['alg:projection']}. The speed-accuracy Pareto front for Sinkhorn's algorithm is the pointwise minimum of the dashed lines. While Annealed Sinkhorn is far away from this front, the debiased version that we propose approaches or beats it.
Figure 2: (left) Optimality gap at iteration $t=200$ as a function of the annealing exponent $\kappa$, such that $\beta_t=\beta_0 (1+t)^{-\kappa}$ (with $\beta_0=10/\Vert c\Vert_\mathrm{osc}$). The optimal exponent ($\star$) are close to the predicted ones: $\kappa=1/2$ for Annealed Sinkhorn and $\kappa=2/3$ for Debiased Annealed Sinkhorn. (right) Distance between the optimization path and the regularization path (dotted lines) compared to their distances to the target. As predicted by Lem. \ref{['lem:ell1-regpath']}, $\Vert \pi_t^\mathrm{reg}\mathds{1} -p\Vert_1$ is in $\Theta(t^{\kappa-1})$, and it closely approximates $\Vert \pi_t\mathds{1} -p\Vert_1$. The error between both paths (dotted lines) is more than an order of magnitude less. (Both experiments use the "geometric" cost).
Figure 3: (left) Behavior of piecewise constant annealing schedules. We take a base schedule $\bar{\beta}_t =\beta_0(1+t)^\kappa$ and use an actual schedule $\beta_t$ which is updated to the value $\tilde{\beta}_t$ only for $t=16k^2$, $k\in \mathbb{N}$ (and is constant otherwise). This standard technique leads to improvements at the end of each plateau, but appears less effective than Debiased Annealed Sinkhorn. (right) Comparison of the symmetric vs. asymmetric versions of Annealed Sinkhorn and their debiasing. The considered problem has no symmetry, which explains why symmetric Sinkhorn underperforms. Both experiments in the "geometric cost" setting.
Figure 4: Configuration of the data points used in the "geometric setting".

Theorems & Definitions (22)

Theorem 2.1: Convergence of Annealed Sinkhorn
proof
Lemma 2.2: OMD guarantee for Annealed Sinkhorn
proof
Lemma 2.3: altschuler2017near
Lemma 2.4: Pinsker's inequality
Remark 2.5: Optimization bounds galore for Sinkhorn
Proposition 3.1: Online vs. Regularization paths
proof
Theorem 3.2: Convergence rate of the regularization path
...and 12 more

Annealed Sinkhorn for Optimal Transport: convergence, regularization path and debiasing

TL;DR

Abstract

Annealed Sinkhorn for Optimal Transport: convergence, regularization path and debiasing

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (22)