Table of Contents
Fetching ...

Wasserstein Gradient Flows for Moreau Envelopes of f-Divergences in Reproducing Kernel Hilbert Spaces

Viktor Stein, Sebastian Neumayer, Nicolaj Rux, Gabriele Steidl

TL;DR

This work addresses the limitation of classical $f$-divergences in handling measures with limited support by introducing a squared MMD regularization with a characteristic kernel. The authors establish that the regularized divergences can be written as the Moreau envelope of a convex functional on the associated RKHS, enabling analysis of gradients and Wasserstein gradient flows via kernel mean embeddings. They prove existence and uniqueness of the Wasserstein gradient flows, derive Fréchet-differentiable gradients, and show $\lambda$-convexity along generalized geodesics under smooth kernels. Numerical experiments with Tsallis-$\alpha$ divergences demonstrate effective flow behavior from empirical measures and illustrate the roles of finite versus infinite recession constants and tight variational formulations, with practical implications for variational inference and generative modeling.

Abstract

Commonly used $f$-divergences of measures, e.g., the Kullback-Leibler divergence, are subject to limitations regarding the support of the involved measures. A remedy is regularizing the $f$-divergence by a squared maximum mean discrepancy (MMD) associated with a characteristic kernel $K$. We use the kernel mean embedding to show that this regularization can be rewritten as the Moreau envelope of some function on the associated reproducing kernel Hilbert space. Then, we exploit well-known results on Moreau envelopes in Hilbert spaces to analyze the MMD-regularized $f$-divergences, particularly their gradients. Subsequently, we use our findings to analyze Wasserstein gradient flows of MMD-regularized $f$-divergences. We provide proof-of-the-concept numerical examples for flows starting from empirical measures. Here, we cover $f$-divergences with infinite and finite recession constants. Lastly, we extend our results to the tight variational formulation of $f$-divergences and numerically compare the resulting flows.

Wasserstein Gradient Flows for Moreau Envelopes of f-Divergences in Reproducing Kernel Hilbert Spaces

TL;DR

This work addresses the limitation of classical -divergences in handling measures with limited support by introducing a squared MMD regularization with a characteristic kernel. The authors establish that the regularized divergences can be written as the Moreau envelope of a convex functional on the associated RKHS, enabling analysis of gradients and Wasserstein gradient flows via kernel mean embeddings. They prove existence and uniqueness of the Wasserstein gradient flows, derive Fréchet-differentiable gradients, and show -convexity along generalized geodesics under smooth kernels. Numerical experiments with Tsallis- divergences demonstrate effective flow behavior from empirical measures and illustrate the roles of finite versus infinite recession constants and tight variational formulations, with practical implications for variational inference and generative modeling.

Abstract

Commonly used -divergences of measures, e.g., the Kullback-Leibler divergence, are subject to limitations regarding the support of the involved measures. A remedy is regularizing the -divergence by a squared maximum mean discrepancy (MMD) associated with a characteristic kernel . We use the kernel mean embedding to show that this regularization can be rewritten as the Moreau envelope of some function on the associated reproducing kernel Hilbert space. Then, we exploit well-known results on Moreau envelopes in Hilbert spaces to analyze the MMD-regularized -divergences, particularly their gradients. Subsequently, we use our findings to analyze Wasserstein gradient flows of MMD-regularized -divergences. We provide proof-of-the-concept numerical examples for flows starting from empirical measures. Here, we cover -divergences with infinite and finite recession constants. Lastly, we extend our results to the tight variational formulation of -divergences and numerically compare the resulting flows.
Paper Structure (28 sections, 21 theorems, 123 equations, 12 figures, 3 tables, 2 algorithms)

This paper contains 28 sections, 21 theorems, 123 equations, 12 figures, 3 tables, 2 algorithms.

Key Result

Theorem 1

The Moreau envelope of $G\in \Gamma_0(\mathcal{H})$ has the following properties:

Figures (12)

  • Figure 1: WGF of the regularized Tsallis-$\alpha$ divergence $D_{f_{\alpha}, \nu}^{\lambda}$ for $\alpha \in \{ 1, 3, 7.5 \}$.
  • Figure 3: WGF of the regularized Tsallis-$\alpha$ divergence $D_{f_{7.5}, \nu}^{\lambda}$ for Neals cross.
  • Figure 5: WGF of the regularized Tsallis-3 divergence $D_{f_{3}, \nu}^{\lambda}$ with the bananas target.
  • Figure 6: WGF of the regularized TV divergence $D_{f_{\mathop{\mathrm{TV}}\nolimits}, \nu}^{\lambda}$ with the three rings target.
  • Figure 7: WGF of the regularized $\frac{1}{2}$-Tsallis divergence $D_{f_{\frac{1}{2}}, \nu}^{\lambda}$ without (top) and with annealing (bottom), where $\nu$ is the three rings target.
  • ...and 7 more figures

Theorems & Definitions (45)

  • Theorem 1
  • Remark 2
  • Lemma 3
  • proof
  • Lemma 4
  • proof
  • Example 5: Rescaled Marton divergence
  • Lemma 6
  • proof
  • Lemma 7
  • ...and 35 more