Wasserstein Gradient Flows for Moreau Envelopes of f-Divergences in Reproducing Kernel Hilbert Spaces
Viktor Stein, Sebastian Neumayer, Nicolaj Rux, Gabriele Steidl
TL;DR
This work addresses the limitation of classical $f$-divergences in handling measures with limited support by introducing a squared MMD regularization with a characteristic kernel. The authors establish that the regularized divergences can be written as the Moreau envelope of a convex functional on the associated RKHS, enabling analysis of gradients and Wasserstein gradient flows via kernel mean embeddings. They prove existence and uniqueness of the Wasserstein gradient flows, derive Fréchet-differentiable gradients, and show $\lambda$-convexity along generalized geodesics under smooth kernels. Numerical experiments with Tsallis-$\alpha$ divergences demonstrate effective flow behavior from empirical measures and illustrate the roles of finite versus infinite recession constants and tight variational formulations, with practical implications for variational inference and generative modeling.
Abstract
Commonly used $f$-divergences of measures, e.g., the Kullback-Leibler divergence, are subject to limitations regarding the support of the involved measures. A remedy is regularizing the $f$-divergence by a squared maximum mean discrepancy (MMD) associated with a characteristic kernel $K$. We use the kernel mean embedding to show that this regularization can be rewritten as the Moreau envelope of some function on the associated reproducing kernel Hilbert space. Then, we exploit well-known results on Moreau envelopes in Hilbert spaces to analyze the MMD-regularized $f$-divergences, particularly their gradients. Subsequently, we use our findings to analyze Wasserstein gradient flows of MMD-regularized $f$-divergences. We provide proof-of-the-concept numerical examples for flows starting from empirical measures. Here, we cover $f$-divergences with infinite and finite recession constants. Lastly, we extend our results to the tight variational formulation of $f$-divergences and numerically compare the resulting flows.
