Table of Contents
Fetching ...

Accelerating Sinkhorn Algorithm with Sparse Newton Iterations

Xun Tang, Michael Shavlovsky, Holakou Rahmanian, Elisa Tardini, Kiran Koshy Thekumparampil, Tesi Xiao, Lexing Ying

TL;DR

The paper addresses the slow convergence of entropic OT solved via the Sinkhorn algorithm by proposing Sinkhorn-Newton-Sparse (SNS), which combines an early-stopped Sinkhorn stage with a sparsified Newton step. The core idea is that the Lyapunov potential associated with entropic OT has an approximately sparse Hessian after practical iterations, enabling a fast Newton direction with $O(n^{2})$ per-iteration cost. The authors provide a non-asymptotic sparsity analysis, extend the results to cases with non-unique optimal transport plans, and validate the method through numerical experiments showing orders-of-magnitude reductions in iteration counts and competitive runtimes, especially on high-entropy settings and discretized densities. This work advances high-precision OT computation by delivering a scalable second-order acceleration technique that retains the original problem structure while enabling rapid convergence. It also introduces augmented Lyapunov formulations to handle degeneracies and offers theoretical sparsity bounds linked to extremal combinatorics in non-unique settings.

Abstract

Computing the optimal transport distance between statistical distributions is a fundamental task in machine learning. One remarkable recent advancement is entropic regularization and the Sinkhorn algorithm, which utilizes only matrix scaling and guarantees an approximated solution with near-linear runtime. Despite the success of the Sinkhorn algorithm, its runtime may still be slow due to the potentially large number of iterations needed for convergence. To achieve possibly super-exponential convergence, we present Sinkhorn-Newton-Sparse (SNS), an extension to the Sinkhorn algorithm, by introducing early stopping for the matrix scaling steps and a second stage featuring a Newton-type subroutine. Adopting the variational viewpoint that the Sinkhorn algorithm maximizes a concave Lyapunov potential, we offer the insight that the Hessian matrix of the potential function is approximately sparse. Sparsification of the Hessian results in a fast $O(n^2)$ per-iteration complexity, the same as the Sinkhorn algorithm. In terms of total iteration count, we observe that the SNS algorithm converges orders of magnitude faster across a wide range of practical cases, including optimal transportation between empirical distributions and calculating the Wasserstein $W_1, W_2$ distance of discretized densities. The empirical performance is corroborated by a rigorous bound on the approximate sparsity of the Hessian matrix.

Accelerating Sinkhorn Algorithm with Sparse Newton Iterations

TL;DR

The paper addresses the slow convergence of entropic OT solved via the Sinkhorn algorithm by proposing Sinkhorn-Newton-Sparse (SNS), which combines an early-stopped Sinkhorn stage with a sparsified Newton step. The core idea is that the Lyapunov potential associated with entropic OT has an approximately sparse Hessian after practical iterations, enabling a fast Newton direction with per-iteration cost. The authors provide a non-asymptotic sparsity analysis, extend the results to cases with non-unique optimal transport plans, and validate the method through numerical experiments showing orders-of-magnitude reductions in iteration counts and competitive runtimes, especially on high-entropy settings and discretized densities. This work advances high-precision OT computation by delivering a scalable second-order acceleration technique that retains the original problem structure while enabling rapid convergence. It also introduces augmented Lyapunov formulations to handle degeneracies and offers theoretical sparsity bounds linked to extremal combinatorics in non-unique settings.

Abstract

Computing the optimal transport distance between statistical distributions is a fundamental task in machine learning. One remarkable recent advancement is entropic regularization and the Sinkhorn algorithm, which utilizes only matrix scaling and guarantees an approximated solution with near-linear runtime. Despite the success of the Sinkhorn algorithm, its runtime may still be slow due to the potentially large number of iterations needed for convergence. To achieve possibly super-exponential convergence, we present Sinkhorn-Newton-Sparse (SNS), an extension to the Sinkhorn algorithm, by introducing early stopping for the matrix scaling steps and a second stage featuring a Newton-type subroutine. Adopting the variational viewpoint that the Sinkhorn algorithm maximizes a concave Lyapunov potential, we offer the insight that the Hessian matrix of the potential function is approximately sparse. Sparsification of the Hessian results in a fast per-iteration complexity, the same as the Sinkhorn algorithm. In terms of total iteration count, we observe that the SNS algorithm converges orders of magnitude faster across a wide range of practical cases, including optimal transportation between empirical distributions and calculating the Wasserstein distance of discretized densities. The empirical performance is corroborated by a rigorous bound on the approximate sparsity of the Hessian matrix.
Paper Structure (26 sections, 5 theorems, 36 equations, 3 figures, 4 tables, 2 algorithms)

This paper contains 26 sections, 5 theorems, 36 equations, 3 figures, 4 tables, 2 algorithms.

Key Result

Theorem 1

(Informal version of Theorem thm: main) Assume $\min_{P: P\bm{1}=r , P^{\top}\bm{1} = c} C\cdot P$ admits a unique solution. Then, if $t, \eta$ are sufficiently large, the Hessian matrix after $t$ Sinkhorn matrix scaling step is $( \frac{3}{2n}, 12n^2\exp{\left(-p\eta\right)} + \frac{q}{\sqrt{t}})$

Figures (3)

  • Figure 1: Performance comparison between Algorithm \ref{['alg:SNS']} and the Sinkhorn algorithm.
  • Figure 2: Performance of Quasi-Newton methods, compared against the Sinkhorn-Newton-Sparse algorithm and the Sinkhorn algorithm.
  • Figure 3: Optimal transport cost of the obtained entropic regularized solution for different $\eta$.

Theorems & Definitions (11)

  • Definition 1
  • Theorem
  • Definition 2
  • Theorem 1
  • proof
  • Theorem 2
  • Theorem 3
  • Proposition 1
  • proof
  • proof
  • ...and 1 more