Bregman-Wasserstein divergence: geometry and applications
Amanjit Singh Kainth, Cale Rankin, Ting-Kam Leonard Wong
TL;DR
The paper introduces the Bregman-Wasserstein divergence, a transport cost built from a Bregman ground, and develops a comprehensive geometric framework that lifts Bregman geometry to the space of probability measures. It establishes primal and dual displacement interpolations, a generalized Pythagorean inequality, and a generalized dualistic structure that extends Otto/Lott information geometry to infinite dimensions. The authors provide probabilistic interpretations via exponential families, relate BW OT to classical $\mathscr{W}_2$ theory, and present neural OT methods, BW barycenters with Bayesian connections, and a BW-JKO scheme for discretizing Riemannian Wasserstein gradient flows. Together, these contributions offer a tractable, geometry-aware generalization of optimal transport with broad implications for statistics, Bayesian learning, and distributional optimization.
Abstract
The Bregman-Wasserstein divergence is the optimal transport cost when the underlying cost function is given by a Bregman divergence, and arises naturally in fields such as statistics and machine learning. We establish fundamental properties of the Bregman-Wasserstein divergence and propose a novel generalized transport geometry that promotes the Bregman geometry to the space of probability distributions. We provide a probabilistic interpretation involving exponential families and define generalized displacement interpolations compatible with the Bregman geometry. These interpolations are used to derive a generalized Pythagorean inequality, which is of independent interest. Furthermore, we construct a generalized dualistic geometry that lifts the differential geometry of the Bregman divergence to an infinite-dimensional statistical manifold. On the computational side, we demonstrate how Bregman-Wasserstein optimal transport maps can be estimated using neural approaches, establish the well-posedness of Bregman-Wasserstein barycenters, and relate them to Bayesian learning. Finally, we introduce the Bregman-Wasserstein JKO scheme for discretizing Riemannian Wasserstein gradient flows.
