Table of Contents
Fetching ...

Wasserstein KL-divergence for Gaussian distributions

Adwait Datar, Nihat Ay

TL;DR

The paper defines a Wasserstein-based analogue of the KL-divergence, $D_{\rm WKL}$, on $\mathbb{R}^n$ via gradient-flow transport of a potential and derives an explicit Gaussian-family formula in terms of means and covariances. It provides a unified multivariate expression using matrices $R$ and $Q$, and a univariate corollary with a finite Dirac-limit behavior, highlighting favorable geometry-aligned properties over classical KL-divergence. The results connect Wasserstein geometry with information geometry for Gaussian distributions, yielding finite divergences for point-mass limits and offering potential benefits for optimization and estimation tasks. The work lays groundwork for future information-theoretic applications of $D_{\rm WKL}$ and its empirical evaluation in machine learning contexts.

Abstract

We introduce a new version of the KL-divergence for Gaussian distributions which is based on Wasserstein geometry and referred to as WKL-divergence. We show that this version is consistent with the geometry of the sample space ${\Bbb R}^n$. In particular, we can evaluate the WKL-divergence of the Dirac measures concentrated in two points which turns out to be proportional to the squared distance between these points.

Wasserstein KL-divergence for Gaussian distributions

TL;DR

The paper defines a Wasserstein-based analogue of the KL-divergence, , on via gradient-flow transport of a potential and derives an explicit Gaussian-family formula in terms of means and covariances. It provides a unified multivariate expression using matrices and , and a univariate corollary with a finite Dirac-limit behavior, highlighting favorable geometry-aligned properties over classical KL-divergence. The results connect Wasserstein geometry with information geometry for Gaussian distributions, yielding finite divergences for point-mass limits and offering potential benefits for optimization and estimation tasks. The work lays groundwork for future information-theoretic applications of and its empirical evaluation in machine learning contexts.

Abstract

We introduce a new version of the KL-divergence for Gaussian distributions which is based on Wasserstein geometry and referred to as WKL-divergence. We show that this version is consistent with the geometry of the sample space . In particular, we can evaluate the WKL-divergence of the Dirac measures concentrated in two points which turns out to be proportional to the squared distance between these points.

Paper Structure

This paper contains 7 sections, 4 theorems, 29 equations, 2 figures.

Key Result

lemma thmcounterlemma

Consider the gradient flow dynamics eq:dynamics with a quadratic function $f$ given in eq:potential and let $M=2Ae^{2A}-e^{2A}+I$. The following identity holds for all $x_0\in \mathbb{R}^n$:

Figures (2)

  • Figure 1: Left figure shows the variation in $D_{\rm WKL}$ and $D_{\rm KL}$ as $\sigma$ shrink to $0$. Right figure shows the local variation in $D_{\rm WKL}$ and $D_{\rm KL}$ as the $\sigma$ varies around $\sigma_{\textnormal{opt}}\in \{1,3\}$.
  • Figure 2: Surface plots for $D_{\rm WKL}(\mu \lVert \nu)$ (left) and $D_{\rm KL}(\nu \lVert \mu)$ (right) where $\mu=\mathcal{N}(0,\sigma_0^2)$ and $\nu=\mathcal{N}(1,\sigma_1^2)$

Theorems & Definitions (8)

  • lemma thmcounterlemma
  • proof
  • lemma thmcounterlemma
  • proof
  • theorem thmcountertheorem
  • proof
  • corollary thmcountercorollary
  • proof