Wasserstein KL-divergence for Gaussian distributions

Adwait Datar; Nihat Ay

Wasserstein KL-divergence for Gaussian distributions

Adwait Datar, Nihat Ay

TL;DR

The paper defines a Wasserstein-based analogue of the KL-divergence, $D_{\rm WKL}$, on $\mathbb{R}^n$ via gradient-flow transport of a potential and derives an explicit Gaussian-family formula in terms of means and covariances. It provides a unified multivariate expression using matrices $R$ and $Q$, and a univariate corollary with a finite Dirac-limit behavior, highlighting favorable geometry-aligned properties over classical KL-divergence. The results connect Wasserstein geometry with information geometry for Gaussian distributions, yielding finite divergences for point-mass limits and offering potential benefits for optimization and estimation tasks. The work lays groundwork for future information-theoretic applications of $D_{\rm WKL}$ and its empirical evaluation in machine learning contexts.

Abstract

We introduce a new version of the KL-divergence for Gaussian distributions which is based on Wasserstein geometry and referred to as WKL-divergence. We show that this version is consistent with the geometry of the sample space ${\Bbb R}^n$. In particular, we can evaluate the WKL-divergence of the Dirac measures concentrated in two points which turns out to be proportional to the squared distance between these points.

Wasserstein KL-divergence for Gaussian distributions

TL;DR

The paper defines a Wasserstein-based analogue of the KL-divergence,

, on

via gradient-flow transport of a potential and derives an explicit Gaussian-family formula in terms of means and covariances. It provides a unified multivariate expression using matrices

and

, and a univariate corollary with a finite Dirac-limit behavior, highlighting favorable geometry-aligned properties over classical KL-divergence. The results connect Wasserstein geometry with information geometry for Gaussian distributions, yielding finite divergences for point-mass limits and offering potential benefits for optimization and estimation tasks. The work lays groundwork for future information-theoretic applications of

and its empirical evaluation in machine learning contexts.

Abstract

. In particular, we can evaluate the WKL-divergence of the Dirac measures concentrated in two points which turns out to be proportional to the squared distance between these points.

Wasserstein KL-divergence for Gaussian distributions

TL;DR

Abstract

Wasserstein KL-divergence for Gaussian distributions

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (8)