Table of Contents
Fetching ...

Smoothed Distance Kernels for MMDs and Applications in Wasserstein Gradient Flows

Nicolaj Rux, Michael Quellmalz, Gabriele Steidl

TL;DR

This paper addresses the nondifferentiability and limited theoretical guarantees of the classical negative distance kernel in maximum mean discrepancy (MMD) by constructing a Lipschitz-differentiable, smoothed distance kernel via a 1D smoothing of the absolute value followed by a Riemann-Liouville fractional integral, preserving conditional positive definiteness of order one and near-linear growth.The authors establish a radial kernel $K(x,y)=F(\|x-y\|)-F(\|x\|)-F(\|y\|)$ with $F=\mathcal{I}_d[f]$, where $f=|\cdot|*u$ is smoothed, show that $K$ is positive definite, characteristic, and has a Lipschitz gradient, enabling well-posed Wasserstein gradient flows of the associated MMD.They prove existence and convergence of the gradient-flow dynamics in the Wasserstein-2 space for the smoothed kernel, derive a practical Euler discretization for empirical measures, and demonstrate through 2D and MNIST experiments that the smoothed kernel achieves robust convergence and effective target recovery, aided by a slicing-based fast summation scheme.Overall, the work offers a theoretically justified and computationally practical alternative to the negative distance kernel for kernel-based variational methods in high dimensions.

Abstract

Negative distance kernels $K(x,y) := - \|x-y\|$ were used in the definition of maximum mean discrepancies (MMDs) in statistics and lead to favorable numerical results in various applications. In particular, so-called slicing techniques for handling high-dimensional kernel summations profit from the simple parameter-free structure of the distance kernel. However, due to its non-smoothness in $x=y$, most of the classical theoretical results, e.g. on Wasserstein gradient flows of the corresponding MMD functional do not longer hold true. In this paper, we propose a new kernel which keeps the favorable properties of the negative distance kernel as being conditionally positive definite of order one with a nearly linear increase towards infinity and a simple slicing structure, but is Lipschitz differentiable now. Our construction is based on a simple 1D smoothing procedure of the absolute value function followed by a Riemann-Liouville fractional integral transform. Numerical results demonstrate that the new kernel performs similarly well as the negative distance kernel in gradient descent methods, but now with theoretical guarantees.

Smoothed Distance Kernels for MMDs and Applications in Wasserstein Gradient Flows

TL;DR

This paper addresses the nondifferentiability and limited theoretical guarantees of the classical negative distance kernel in maximum mean discrepancy (MMD) by constructing a Lipschitz-differentiable, smoothed distance kernel via a 1D smoothing of the absolute value followed by a Riemann-Liouville fractional integral, preserving conditional positive definiteness of order one and near-linear growth.The authors establish a radial kernel $K(x,y)=F(\|x-y\|)-F(\|x\|)-F(\|y\|)$ with $F=\mathcal{I}_d[f]$, where $f=|\cdot|*u$ is smoothed, show that $K$ is positive definite, characteristic, and has a Lipschitz gradient, enabling well-posed Wasserstein gradient flows of the associated MMD.They prove existence and convergence of the gradient-flow dynamics in the Wasserstein-2 space for the smoothed kernel, derive a practical Euler discretization for empirical measures, and demonstrate through 2D and MNIST experiments that the smoothed kernel achieves robust convergence and effective target recovery, aided by a slicing-based fast summation scheme.Overall, the work offers a theoretically justified and computationally practical alternative to the negative distance kernel for kernel-based variational methods in high dimensions.

Abstract

Negative distance kernels were used in the definition of maximum mean discrepancies (MMDs) in statistics and lead to favorable numerical results in various applications. In particular, so-called slicing techniques for handling high-dimensional kernel summations profit from the simple parameter-free structure of the distance kernel. However, due to its non-smoothness in , most of the classical theoretical results, e.g. on Wasserstein gradient flows of the corresponding MMD functional do not longer hold true. In this paper, we propose a new kernel which keeps the favorable properties of the negative distance kernel as being conditionally positive definite of order one with a nearly linear increase towards infinity and a simple slicing structure, but is Lipschitz differentiable now. Our construction is based on a simple 1D smoothing procedure of the absolute value function followed by a Riemann-Liouville fractional integral transform. Numerical results demonstrate that the new kernel performs similarly well as the negative distance kernel in gradient descent methods, but now with theoretical guarantees.

Paper Structure

This paper contains 26 sections, 30 theorems, 172 equations, 12 figures, 1 table.

Key Result

Theorem 2.1

The function $f(x) \coloneqq \|x\|^{\beta}$, $x \in \mathbb{R}^d$, with $\beta>0$, $\beta \not \in 2 \mathbb{N}$ has the generalized Fourier transform of order $r= \lceil \frac{\beta}{2} \rceil$. In particular, we have for $\mathop{\mathrm{abs}}\nolimits(x)=|x|$, $x\in\mathbb{R}$, that

Figures (12)

  • Figure 1: Smoothed absolute value $f=\mathop{\mathrm{abs}}\nolimits*M_2$ (solid, blue) with its first (solid, green) and second (solid, orange) derivatives, the latter being equal to $2M_2$; and $F=2\mathcal{I}_3[f]$ (dashed blue) with its first (dashed, green) and second (dashed, orange) derivatives.
  • Figure 2: Target measures $\nu$ (blue) and initialization $\gamma^{(0)}$ (orange).
  • Figure 3: MMD flow \ref{['eq:mmd-flow']} with step size $\tau=0.01$. For the Gaussian kernel, the result depends heavily on the choice of the parameter $\sigma$. For our SND kernel with small $\varepsilon$, the performance is as good as for the ND kernel, which is better than for the Gaussians.
  • Figure 4: $W_2$ error between Three-Rings target $\nu$ and flow $\gamma_t^\tau$ after $k$ with time $t=\tau k$. We compare single precision (first row) and double precision (second row) for step sizes $\tau=0.1$ (left) and $\tau=0.01$ (right). In single precision, SND with $\varepsilon = 0.01$ and ND have the smallest error which gets stuck in $\approx 10^{-3}$. In double precision, SND with $\varepsilon = 0.01$ even outperforms ND. For some explanation, see Proposition \ref{['prop:dirac_flow']}.
  • Figure 5: MMD flow \ref{['eq:mmd-flow']} with step size $\tau=0.02$. For the Gaussian kernel, the result depends heavily on the choice of the parameter $\sigma$. For our SND kernel with small $\varepsilon$, the performance is as good as for the ND kernel, which is better than for the Gaussians.
  • ...and 7 more figures

Theorems & Definitions (40)

  • Theorem 2.1: Wendland2004
  • Proposition 2.2
  • Theorem 2.3: Bochner's Theorem for Generalized Fourier Transform
  • Proposition 3.1
  • Proposition 3.2
  • Corollary 3.3
  • Example 3.4
  • Corollary 3.5
  • Lemma 4.1
  • Theorem 4.2
  • ...and 30 more