Smoothed Distance Kernels for MMDs and Applications in Wasserstein Gradient Flows
Nicolaj Rux, Michael Quellmalz, Gabriele Steidl
TL;DR
This paper addresses the nondifferentiability and limited theoretical guarantees of the classical negative distance kernel in maximum mean discrepancy (MMD) by constructing a Lipschitz-differentiable, smoothed distance kernel via a 1D smoothing of the absolute value followed by a Riemann-Liouville fractional integral, preserving conditional positive definiteness of order one and near-linear growth.The authors establish a radial kernel $K(x,y)=F(\|x-y\|)-F(\|x\|)-F(\|y\|)$ with $F=\mathcal{I}_d[f]$, where $f=|\cdot|*u$ is smoothed, show that $K$ is positive definite, characteristic, and has a Lipschitz gradient, enabling well-posed Wasserstein gradient flows of the associated MMD.They prove existence and convergence of the gradient-flow dynamics in the Wasserstein-2 space for the smoothed kernel, derive a practical Euler discretization for empirical measures, and demonstrate through 2D and MNIST experiments that the smoothed kernel achieves robust convergence and effective target recovery, aided by a slicing-based fast summation scheme.Overall, the work offers a theoretically justified and computationally practical alternative to the negative distance kernel for kernel-based variational methods in high dimensions.
Abstract
Negative distance kernels $K(x,y) := - \|x-y\|$ were used in the definition of maximum mean discrepancies (MMDs) in statistics and lead to favorable numerical results in various applications. In particular, so-called slicing techniques for handling high-dimensional kernel summations profit from the simple parameter-free structure of the distance kernel. However, due to its non-smoothness in $x=y$, most of the classical theoretical results, e.g. on Wasserstein gradient flows of the corresponding MMD functional do not longer hold true. In this paper, we propose a new kernel which keeps the favorable properties of the negative distance kernel as being conditionally positive definite of order one with a nearly linear increase towards infinity and a simple slicing structure, but is Lipschitz differentiable now. Our construction is based on a simple 1D smoothing procedure of the absolute value function followed by a Riemann-Liouville fractional integral transform. Numerical results demonstrate that the new kernel performs similarly well as the negative distance kernel in gradient descent methods, but now with theoretical guarantees.
