Table of Contents
Fetching ...

Bregman-Wasserstein divergence: geometry and applications

Amanjit Singh Kainth, Cale Rankin, Ting-Kam Leonard Wong

TL;DR

The paper introduces the Bregman-Wasserstein divergence, a transport cost built from a Bregman ground, and develops a comprehensive geometric framework that lifts Bregman geometry to the space of probability measures. It establishes primal and dual displacement interpolations, a generalized Pythagorean inequality, and a generalized dualistic structure that extends Otto/Lott information geometry to infinite dimensions. The authors provide probabilistic interpretations via exponential families, relate BW OT to classical $\mathscr{W}_2$ theory, and present neural OT methods, BW barycenters with Bayesian connections, and a BW-JKO scheme for discretizing Riemannian Wasserstein gradient flows. Together, these contributions offer a tractable, geometry-aware generalization of optimal transport with broad implications for statistics, Bayesian learning, and distributional optimization.

Abstract

The Bregman-Wasserstein divergence is the optimal transport cost when the underlying cost function is given by a Bregman divergence, and arises naturally in fields such as statistics and machine learning. We establish fundamental properties of the Bregman-Wasserstein divergence and propose a novel generalized transport geometry that promotes the Bregman geometry to the space of probability distributions. We provide a probabilistic interpretation involving exponential families and define generalized displacement interpolations compatible with the Bregman geometry. These interpolations are used to derive a generalized Pythagorean inequality, which is of independent interest. Furthermore, we construct a generalized dualistic geometry that lifts the differential geometry of the Bregman divergence to an infinite-dimensional statistical manifold. On the computational side, we demonstrate how Bregman-Wasserstein optimal transport maps can be estimated using neural approaches, establish the well-posedness of Bregman-Wasserstein barycenters, and relate them to Bayesian learning. Finally, we introduce the Bregman-Wasserstein JKO scheme for discretizing Riemannian Wasserstein gradient flows.

Bregman-Wasserstein divergence: geometry and applications

TL;DR

The paper introduces the Bregman-Wasserstein divergence, a transport cost built from a Bregman ground, and develops a comprehensive geometric framework that lifts Bregman geometry to the space of probability measures. It establishes primal and dual displacement interpolations, a generalized Pythagorean inequality, and a generalized dualistic structure that extends Otto/Lott information geometry to infinite dimensions. The authors provide probabilistic interpretations via exponential families, relate BW OT to classical theory, and present neural OT methods, BW barycenters with Bayesian connections, and a BW-JKO scheme for discretizing Riemannian Wasserstein gradient flows. Together, these contributions offer a tractable, geometry-aware generalization of optimal transport with broad implications for statistics, Bayesian learning, and distributional optimization.

Abstract

The Bregman-Wasserstein divergence is the optimal transport cost when the underlying cost function is given by a Bregman divergence, and arises naturally in fields such as statistics and machine learning. We establish fundamental properties of the Bregman-Wasserstein divergence and propose a novel generalized transport geometry that promotes the Bregman geometry to the space of probability distributions. We provide a probabilistic interpretation involving exponential families and define generalized displacement interpolations compatible with the Bregman geometry. These interpolations are used to derive a generalized Pythagorean inequality, which is of independent interest. Furthermore, we construct a generalized dualistic geometry that lifts the differential geometry of the Bregman divergence to an infinite-dimensional statistical manifold. On the computational side, we demonstrate how Bregman-Wasserstein optimal transport maps can be estimated using neural approaches, establish the well-posedness of Bregman-Wasserstein barycenters, and relate them to Bayesian learning. Finally, we introduce the Bregman-Wasserstein JKO scheme for discretizing Riemannian Wasserstein gradient flows.
Paper Structure (29 sections, 17 theorems, 162 equations, 8 figures, 2 tables)

This paper contains 29 sections, 17 theorems, 162 equations, 8 figures, 2 tables.

Key Result

Theorem 2.4

Let $p \in \mathcal{M}$. Let $(\gamma_t)_{0 \leq t \leq 1}$ be a primal geodesic with $\gamma_0 = p$ and $(\sigma_t)_{0 \leq t \leq 1}$ be a dual geodesic with $\sigma_0 = p$. Then for $0 \leq t \leq 1$ we have In particular, the generalized Pythagorean relation ${\bf B}(p, \sigma_t) + {\bf B}(\gamma_t, p) = {\bf B}(\gamma_t, \sigma_t)$ holds if and only if $\gamma$ and $\sigma$ are orthogonal at

Figures (8)

  • Figure 1: A Bregman generator $\Omega$ induces two coordinate systems related by the mirror map $D\Omega$. Here $\iota: \mathcal{M}\rightarrow \mathcal{X}$ and $\iota^* : \mathcal{M} \rightarrow \mathcal{Y}$ are, respectively, the primal and dual coordinate maps, i.e., $\iota(p) = x_p$ and $\iota^*(p) = y_p$. They are related by $\iota^* = D\Omega \circ \iota$ or equivalently $\iota = D\Omega^* \circ \iota^*$.
  • Figure 2: Optimal matching in the context of Example \ref{['eg:optimal.matching']} where $\mathcal{Y} = (-1, 1)^d$. Left: Empirical distribution $\mu_0^{\mathcal{Y}}$ at time $0$. Right: Empirical distribution $\mu_1^{\mathcal{Y}}$ at time $1$. Here $d = 2$ and each distribution has $N = 1000$ data points. The optimal matching is indicated by the coloring of the points.
  • Figure 3: Illustration of primal displacement interpolation. We first find a convex gradient $Dh$ which pushforwards $\mu_0^{\mathcal{Y}}$ to $\mu_1^{\mathcal{X}}$. Under the primal representation, the primal displacement interpolation $\mu_t^{\mathcal{X}}$ is defined by the pushforward of $\mu_0^{\mathcal{Y}}$ under the convex combination $Dh_t := (1 - t) D\Omega^* + t Dh$. This guarantees that each particle moves along a primal geodesic $(x_t)$.
  • Figure 4: Illustration of primal and dual displacement interpolations in the context of Example \ref{['eg:generalized.geodesics']}. The source distribution $\mu_0$ is shown in blue and the target $\mu_1$ is shown in green. The first row shows the dual domain (simplex) and the second row shows the primal domain ($\mathbb{R}^2$). The first column illustrates the primal displacement interpolation, and the second column the dual displacement interpolation. We also plot the primal and dual geodesics traced by three particles (shades of orange).
  • Figure 5: Bregman-Wasserstein OT in the context of Example \ref{['eg:neural.OT']}. Figures \ref{['fig:ddi_rout_2_Euclidean']}-\ref{['fig:ddi_rout_2_ConjugateHNNTanh__beta_1.0_']} plot the recovered Brenier map $Df$ from $\mu_0^{\mathcal{X}}$ to $\mu_1^{\mathcal{Y}}$ for various choices of the separable Bregman generator , while Figures \ref{['fig:rout_2_pdi']} and \ref{['fig:rout_2_ddi']} trace (on the primal domain $\mathcal{X}$) the primal and dual displacement interpolations induced by the Brenier maps $Dh$ and $Df$ for the corresponding $\Omega$.
  • ...and 3 more figures

Theorems & Definitions (57)

  • Definition 2.1: Regular Bregman generator
  • Remark 2.2
  • Remark 2.3
  • Theorem 2.4: Generalized Pythagorean theorem for Bregman divergence
  • proof
  • Example 2.5: Finite simplex
  • Example 2.6: Hopfield neural network
  • Definition 2.7: Bregman-Wasserstein divergence
  • Example 2.8: Self-dual case
  • Example 2.9
  • ...and 47 more