Statistical and Geometrical properties of regularized Kernel Kullback-Leibler divergence

Clémentine Chazal; Anna Korba; Francis Bach

Statistical and Geometrical properties of regularized Kernel Kullback-Leibler divergence

Clémentine Chazal, Anna Korba, Francis Bach

TL;DR

This work introduces a regularized Kernel Kullback-Leibler divergence, $KKL_\alpha(p||q)$, to compare probability measures via kernel covariance operators in an RKHS while circumventing finiteness issues for distributions with disjoint supports. It provides a tractable closed-form for the regularized divergence on discrete measures, derives finite-sample and perturbation bounds relating $KKL_\alpha$ to the original $KKL$, and develops a Wasserstein gradient-flow framework for optimizing with respect to $p$, including explicit update rules. The paper establishes monotonicity in the regularization parameter and demonstrates the approach on synthetic experiments, showing improved handling of disjoint supports and preservation of target supports compared to MMD and KALE. Overall, it contributes a robust, implementable alternative to kernel-based divergences with theoretical guarantees and practical optimization tools for discrete and sample-based distributions.

Abstract

In this paper, we study the statistical and geometrical properties of the Kullback-Leibler divergence with kernel covariance operators (KKL) introduced by Bach [2022]. Unlike the classical Kullback-Leibler (KL) divergence that involves density ratios, the KKL compares probability distributions through covariance operators (embeddings) in a reproducible kernel Hilbert space (RKHS), and compute the Kullback-Leibler quantum divergence. This novel divergence hence shares parallel but different aspects with both the standard Kullback-Leibler between probability distributions and kernel embeddings metrics such as the maximum mean discrepancy. A limitation faced with the original KKL divergence is its inability to be defined for distributions with disjoint supports. To solve this problem, we propose in this paper a regularised variant that guarantees that the divergence is well defined for all distributions. We derive bounds that quantify the deviation of the regularised KKL to the original one, as well as finite-sample bounds. In addition, we provide a closed-form expression for the regularised KKL, specifically applicable when the distributions consist of finite sets of points, which makes it implementable. Furthermore, we derive a Wasserstein gradient descent scheme of the KKL divergence in the case of discrete distributions, and study empirically its properties to transport a set of points to a target distribution.

Statistical and Geometrical properties of regularized Kernel Kullback-Leibler divergence

TL;DR

This work introduces a regularized Kernel Kullback-Leibler divergence,

, to compare probability measures via kernel covariance operators in an RKHS while circumventing finiteness issues for distributions with disjoint supports. It provides a tractable closed-form for the regularized divergence on discrete measures, derives finite-sample and perturbation bounds relating

to the original

, and develops a Wasserstein gradient-flow framework for optimizing with respect to

, including explicit update rules. The paper establishes monotonicity in the regularization parameter and demonstrates the approach on synthetic experiments, showing improved handling of disjoint supports and preservation of target supports compared to MMD and KALE. Overall, it contributes a robust, implementable alternative to kernel-based divergences with theoretical guarantees and practical optimization tools for discrete and sample-based distributions.

Abstract

Paper Structure (36 sections, 12 theorems, 105 equations, 13 figures)

This paper contains 36 sections, 12 theorems, 105 equations, 13 figures.

Introduction
Regularized kernel Kullback-Leibler ($\mathrm{KKL}$) divergence
Notations.
Kernel Kullback-Leibler divergence ($\mathrm{KKL}$).
Definition of the regularized KKL.
Skewness and concentration of the regularized $\mathrm{KKL}$
Skewness.
Statistical properties.
Time-discretized regularized $\mathrm{KKL}$ gradient flow
regularized $\mathrm{KKL}$ closed-form.
Gradient flow and closed-form for the derivatives.
Related work
Divergences based on kernels embeddings.
Kernel variational approximation of the KL.
Experiments
...and 21 more sections

Key Result

Proposition 2

Let $p \ll q$. The function $\alpha \mapsto \mathrm{KKL}_{\alpha}(p||q)$ is decreasing on $[0,1]$.

Figures (13)

Figure 1: Concentration of empirical $\mathrm{KKL}_{\alpha}$ for $d=10$, $\sigma = 10$, $p,q$ Gaussians.
Figure 2: MMD, KALE and $\mathrm{KKL}$ flow for 3 rings target.
Figure 3: Shape transfer
Figure 4: $\alpha = 0.01$, $p,q$ Gaussians, $\sigma$ is the square of the mean of distances between $\hat{p}$ and $\hat{q}$.
Figure 5: $\alpha = 0.1$, $\sigma = 2$.
...and 8 more figures

Theorems & Definitions (21)

Remark 1
Proposition 2
Proposition 3
Proposition 4
Remark 5
Proposition 6
Proposition 7
Proposition 8
proof
Proposition 9
...and 11 more

Statistical and Geometrical properties of regularized Kernel Kullback-Leibler divergence

TL;DR

Abstract

Statistical and Geometrical properties of regularized Kernel Kullback-Leibler divergence

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (21)