Deriving the Gradients of Some Popular Optimal Transport Algorithms
Fangzhou Xie
TL;DR
This note derives manual gradients for entropy-regularized OT, focusing on Sinkhorn variants (vanilla, log-stabilized, parallel) as well as Wasserstein barycenters and Wasserstein dictionary learning. It presents reverse-mode gradient derivations, dual-variable formulations, and stable update schemes (log-sum-exp/soft-min) to enable end-to-end gradient-based optimization in machine learning contexts. The work underpins a high-performance R/C++ wig package and includes numerical validation against automatic differentiation tools like Julia Zygote and ForwardDiff. Overall, the paper provides a comprehensive, implementable framework for differentiating OT-based objectives across multiple popular algorithms, facilitating ML applications such as barycenters and dictionary learning with Wasserstein metrics.
Abstract
In this note, I review entropy-regularized Monge-Kantorovich problem in Optimal Transport, and derive the gradients of several popular algorithms popular in Computational Optimal Transport, including the Sinkhorn algorithms, Wasserstein Barycenter algorithms, and the Wasserstein Dictionary Learning algorithms.
