Table of Contents
Fetching ...

Deriving the Gradients of Some Popular Optimal Transport Algorithms

Fangzhou Xie

TL;DR

This note derives manual gradients for entropy-regularized OT, focusing on Sinkhorn variants (vanilla, log-stabilized, parallel) as well as Wasserstein barycenters and Wasserstein dictionary learning. It presents reverse-mode gradient derivations, dual-variable formulations, and stable update schemes (log-sum-exp/soft-min) to enable end-to-end gradient-based optimization in machine learning contexts. The work underpins a high-performance R/C++ wig package and includes numerical validation against automatic differentiation tools like Julia Zygote and ForwardDiff. Overall, the paper provides a comprehensive, implementable framework for differentiating OT-based objectives across multiple popular algorithms, facilitating ML applications such as barycenters and dictionary learning with Wasserstein metrics.

Abstract

In this note, I review entropy-regularized Monge-Kantorovich problem in Optimal Transport, and derive the gradients of several popular algorithms popular in Computational Optimal Transport, including the Sinkhorn algorithms, Wasserstein Barycenter algorithms, and the Wasserstein Dictionary Learning algorithms.

Deriving the Gradients of Some Popular Optimal Transport Algorithms

TL;DR

This note derives manual gradients for entropy-regularized OT, focusing on Sinkhorn variants (vanilla, log-stabilized, parallel) as well as Wasserstein barycenters and Wasserstein dictionary learning. It presents reverse-mode gradient derivations, dual-variable formulations, and stable update schemes (log-sum-exp/soft-min) to enable end-to-end gradient-based optimization in machine learning contexts. The work underpins a high-performance R/C++ wig package and includes numerical validation against automatic differentiation tools like Julia Zygote and ForwardDiff. Overall, the paper provides a comprehensive, implementable framework for differentiating OT-based objectives across multiple popular algorithms, facilitating ML applications such as barycenters and dictionary learning with Wasserstein metrics.

Abstract

In this note, I review entropy-regularized Monge-Kantorovich problem in Optimal Transport, and derive the gradients of several popular algorithms popular in Computational Optimal Transport, including the Sinkhorn algorithms, Wasserstein Barycenter algorithms, and the Wasserstein Dictionary Learning algorithms.

Paper Structure

This paper contains 21 sections, 7 theorems, 156 equations, 5 figures, 10 algorithms.

Key Result

Proposition 3

The solution to eqn:entropic-regularized-OT-problem is unique and has the form or in matrix notation where $i \in \left\{1, \ldots, M\right\}$ and $j \in \left\{1, \ldots, N\right\}$ with two (unknown) scaling variables $\mathbf{u} \in \mathbb{R}^M$ and $\mathbf{v} \in \mathbb{R}^N$.

Figures (5)

  • Figure 1: Diagram of the structure of the paper.
  • Figure 2: Computational graph for the vanilla Sinkhorn algorithm.
  • Figure 3: Computational graph for the log-stabilized Sinkhorn algorithm.
  • Figure 4: Computational graph for the parallel barycenter algorithm.
  • Figure 5: Computational graph for the log-stabilized barycenter algorithm.

Theorems & Definitions (27)

  • Proposition 3: Uniqueness of the Entropic OT Solution
  • proof
  • Remark
  • Remark
  • Remark : Convergence Condition
  • Remark
  • Remark
  • Remark : Convergence Condition
  • Definition 4
  • Remark
  • ...and 17 more