Table of Contents
Fetching ...

The Wasserstein gradient flow of the Sinkhorn divergence between Gaussian distributions

Mathis Hardion, Théo Lacombe

TL;DR

This work analyzes the Wasserstein gradient flow of the Sinkhorn divergence $S_\varepsilon(\cdot,\mu_\star)$ when both source and target are Gaussian, establishing existence, Gaussian-invariance, and uniqueness (within a regular measure class) of the flow. It derives explicit mean and covariance dynamics, proving global convergence to the target when the initial covariance is non-singular and detailing limit behavior when singular or commuting covariances arise; in the commuting case, it shows exponential convergence for full-support targets and $O(t^{-1})$ rates when the target lies in a subspace. The results connect closed-form Gaussian Sinkhorn formulas with the Bures–Wasserstein geometry, yielding precise eigenvalue evolutions and energy-dissipation relations, and are complemented by explicit time-discretization schemes and numerical experiments. Collectively, the paper provides a rigorous partial convergence theory for Gaussian Sinkhorn flows and offers practical guidance for simulations and potential extensions to broader settings.

Abstract

We study the Wasserstein gradient flow of the Sinkhorn divergence when both the source and the target are Gaussian distributions. We prove the existence of a flow that stays in the class of Gaussian distributions, and is unique in the larger class of measures with strongly-concave and smooth log-densities. We prove that the flow globally converges toward the target measure when the source's covariance matrix is not singular, and provide counter-examples to global convergence when it is, giving a first answer to an open question raised in [Carlier et al. 2024, \S4.2]. When the covariance matrix of the source distribution commutes with that of the target, we derive more quantitative results that showcase exponential convergence toward the target when the source and the target share their support, but dropping to linear rates (O(t^{-1})) if the target is concentrated on a strict subspace of the source's support.

The Wasserstein gradient flow of the Sinkhorn divergence between Gaussian distributions

TL;DR

This work analyzes the Wasserstein gradient flow of the Sinkhorn divergence when both source and target are Gaussian, establishing existence, Gaussian-invariance, and uniqueness (within a regular measure class) of the flow. It derives explicit mean and covariance dynamics, proving global convergence to the target when the initial covariance is non-singular and detailing limit behavior when singular or commuting covariances arise; in the commuting case, it shows exponential convergence for full-support targets and rates when the target lies in a subspace. The results connect closed-form Gaussian Sinkhorn formulas with the Bures–Wasserstein geometry, yielding precise eigenvalue evolutions and energy-dissipation relations, and are complemented by explicit time-discretization schemes and numerical experiments. Collectively, the paper provides a rigorous partial convergence theory for Gaussian Sinkhorn flows and offers practical guidance for simulations and potential extensions to broader settings.

Abstract

We study the Wasserstein gradient flow of the Sinkhorn divergence when both the source and the target are Gaussian distributions. We prove the existence of a flow that stays in the class of Gaussian distributions, and is unique in the larger class of measures with strongly-concave and smooth log-densities. We prove that the flow globally converges toward the target measure when the source's covariance matrix is not singular, and provide counter-examples to global convergence when it is, giving a first answer to an open question raised in [Carlier et al. 2024, \S4.2]. When the covariance matrix of the source distribution commutes with that of the target, we derive more quantitative results that showcase exponential convergence toward the target when the source and the target share their support, but dropping to linear rates (O(t^{-1})) if the target is concentrated on a strict subspace of the source's support.
Paper Structure (26 sections, 17 theorems, 68 equations, 4 figures)

This paper contains 26 sections, 17 theorems, 68 equations, 4 figures.

Key Result

Theorem 1.1

Let $\mu_0, \mu_\star$ be Gaussian measures, and $\mathrm{supp}(\mu_0)$, $\mathrm{supp}(\mu_\star)$ their respective supports. There exists a unique solution $(\mu_t)_t$ of eq:SWGF which stays Gaussian, it is a Wasserstein gradient flow of $S_\varepsilon(\cdot, \mu_\star)$ in the sense of ambrosio20

Figures (4)

  • Figure 1: Covariance ellipses of the flow for a non-singular source and a non-singular (left) vs singular (right) target (red). The gray grid lines are spaced by $\sqrt{\varepsilon}$.
  • Figure 2: Covariance ellipses for singular Gaussian distributions, in an orthogonal configuration (left, with y-axis marginal for visual clarity) vs. slightly rotated (right).
  • Figure 3: Values of $S_\varepsilon(\mu_t, \mu_\star)/S_\varepsilon(\mu_0,\mu_\star)$ over time for $\Sigma_0 =\mathrm{Id}$ (commuting case, left) and $\Sigma_0$ the same as in \ref{['fig:ellipses-non-singular']} (non-commuting case, middle), to $\Sigma_\star = \mathrm{diag}((0, \lambda^\star))$ for different values of $\lambda^\star$, as a semi-log plot. On the right, the conditions are the same as the middle but for a longer time interval and in log-log scale.
  • Figure 4: Values of $S_\varepsilon(\mu_t, \mu_\star)/S_\varepsilon(\mu_0,\mu_\star)$ over time for different values of $\varepsilon$ (same source and target as in the left of \ref{['fig:ellipses-non-singular']}).

Theorems & Definitions (37)

  • Theorem 1.1
  • Proposition 1
  • Lemma 1
  • proof
  • proof : Proof of \ref{['thm:Seps-gauss']}
  • Lemma 2
  • proof
  • Theorem 3.1
  • Theorem 3.2
  • Lemma 3
  • ...and 27 more