Wasserstein KL-divergence for Gaussian distributions
Adwait Datar, Nihat Ay
TL;DR
The paper defines a Wasserstein-based analogue of the KL-divergence, $D_{\rm WKL}$, on $\mathbb{R}^n$ via gradient-flow transport of a potential and derives an explicit Gaussian-family formula in terms of means and covariances. It provides a unified multivariate expression using matrices $R$ and $Q$, and a univariate corollary with a finite Dirac-limit behavior, highlighting favorable geometry-aligned properties over classical KL-divergence. The results connect Wasserstein geometry with information geometry for Gaussian distributions, yielding finite divergences for point-mass limits and offering potential benefits for optimization and estimation tasks. The work lays groundwork for future information-theoretic applications of $D_{\rm WKL}$ and its empirical evaluation in machine learning contexts.
Abstract
We introduce a new version of the KL-divergence for Gaussian distributions which is based on Wasserstein geometry and referred to as WKL-divergence. We show that this version is consistent with the geometry of the sample space ${\Bbb R}^n$. In particular, we can evaluate the WKL-divergence of the Dirac measures concentrated in two points which turns out to be proportional to the squared distance between these points.
