Table of Contents
Fetching ...

Generative Sliced MMD Flows with Riesz Kernels

Johannes Hertrich, Christian Wald, Fabian Altekrüger, Paul Hagemann

TL;DR

The paper addresses the computational bottleneck of maximum mean discrepancy (MMD) in high dimensions by exploiting Riesz kernels, showing that the MMD equals the sliced MMD under these kernels and enabling 1D gradient computations. For the case $r=1$, a sorting-based method reduces gradient evaluation to $O((M+N)\log(M+N))$, and a finite number of projections yields a stochastic gradient estimate with error $O(\sqrt{d/P})$, making large-scale gradient-flow training tractable. The authors formulate Generative MMD Flows using a discretized gradient flow with optional momentum and train a sequence of neural networks to approximate the steps, achieving scalable image generation on standard benchmarks. They also connect sliced MMD to the Wasserstein-1 distance, provide explicit constants, and validate the approach with extensive experiments on MNIST, FashionMNIST, CIFAR10, and CelebA. Overall, the work offers a practical, efficient framework for gradient-flow-based generative modelling via sliced MMD with Riesz kernels and demonstrates strong empirical performance.

Abstract

Maximum mean discrepancy (MMD) flows suffer from high computational costs in large scale computations. In this paper, we show that MMD flows with Riesz kernels $K(x,y) = - \|x-y\|^r$, $r \in (0,2)$ have exceptional properties which allow their efficient computation. We prove that the MMD of Riesz kernels, which is also known as energy distance, coincides with the MMD of their sliced version. As a consequence, the computation of gradients of MMDs can be performed in the one-dimensional setting. Here, for $r=1$, a simple sorting algorithm can be applied to reduce the complexity from $O(MN+N^2)$ to $O((M+N)\log(M+N))$ for two measures with $M$ and $N$ support points. As another interesting follow-up result, the MMD of compactly supported measures can be estimated from above and below by the Wasserstein-1 distance. For the implementations we approximate the gradient of the sliced MMD by using only a finite number $P$ of slices. We show that the resulting error has complexity $O(\sqrt{d/P})$, where $d$ is the data dimension. These results enable us to train generative models by approximating MMD gradient flows by neural networks even for image applications. We demonstrate the efficiency of our model by image generation on MNIST, FashionMNIST and CIFAR10.

Generative Sliced MMD Flows with Riesz Kernels

TL;DR

The paper addresses the computational bottleneck of maximum mean discrepancy (MMD) in high dimensions by exploiting Riesz kernels, showing that the MMD equals the sliced MMD under these kernels and enabling 1D gradient computations. For the case , a sorting-based method reduces gradient evaluation to , and a finite number of projections yields a stochastic gradient estimate with error , making large-scale gradient-flow training tractable. The authors formulate Generative MMD Flows using a discretized gradient flow with optional momentum and train a sequence of neural networks to approximate the steps, achieving scalable image generation on standard benchmarks. They also connect sliced MMD to the Wasserstein-1 distance, provide explicit constants, and validate the approach with extensive experiments on MNIST, FashionMNIST, CIFAR10, and CelebA. Overall, the work offers a practical, efficient framework for gradient-flow-based generative modelling via sliced MMD with Riesz kernels and demonstrates strong empirical performance.

Abstract

Maximum mean discrepancy (MMD) flows suffer from high computational costs in large scale computations. In this paper, we show that MMD flows with Riesz kernels , have exceptional properties which allow their efficient computation. We prove that the MMD of Riesz kernels, which is also known as energy distance, coincides with the MMD of their sliced version. As a consequence, the computation of gradients of MMDs can be performed in the one-dimensional setting. Here, for , a simple sorting algorithm can be applied to reduce the complexity from to for two measures with and support points. As another interesting follow-up result, the MMD of compactly supported measures can be estimated from above and below by the Wasserstein-1 distance. For the implementations we approximate the gradient of the sliced MMD by using only a finite number of slices. We show that the resulting error has complexity , where is the data dimension. These results enable us to train generative models by approximating MMD gradient flows by neural networks even for image applications. We demonstrate the efficiency of our model by image generation on MNIST, FashionMNIST and CIFAR10.
Paper Structure (17 sections, 9 theorems, 67 equations, 10 figures, 2 tables, 3 algorithms)

This paper contains 17 sections, 9 theorems, 67 equations, 10 figures, 2 tables, 3 algorithms.

Key Result

Theorem 1

Let ${\mathrm k}(x,y) \coloneqq -|x-y|^r$, $r\in(0,2)$. Then, for $\mu, \nu \in \mathcal{P}_r(\mathbb{R}^d)$, it holds $\mathcal{SD}_{\mathrm k}^2(\mu,\nu) = \mathcal{D}_{\mathrm{K}}^2(\mu,\nu)$ with the associated scaled Riesz kernel

Figures (10)

  • Figure 1: Left: Comparison of run time for $1000$ gradient evaluations of naive MMD and sliced MMD with different number of projections $P$ in the case $d=100$. Middle and right: Relative error of the gradients of sliced MMD and MMD with respect to the number $P$ of projections and the dimension $d$. The results show the relative error behaves asymptotically as $O(\sqrt{d/P})$ as shown in Theorem \ref{['thm:convergence_rate']}.
  • Figure 2: Samples and their trajectories from MNIST (left) and CIFAR10 (right) in the MMD flow with momentum (\ref{['eq:Mom_MMD_GD']}, top) and without momentum (\ref{['eq:MMD_GD']}, bottom) starting in the uniform distribution on $[0,1]^d$ after $2^k$ steps with $k\in\{0,...,16\}$ (for MNIST) and $k\in\{3,...,19\}$ (for CIFAR10). We observe that the momentum MMD flow \ref{['eq:Mom_MMD_GD']} converges faster than the MMD flow \ref{['eq:MMD_GD']} without momentum.
  • Figure 3: Generated samples of our generative MMD Flow.
  • Figure 4: Comparison of the MMD flow with Gaussian kernel (top) and inverse multiquadric kernel (bottom) for different hyperparameters.
  • Figure 5: Comparison of the MMD flow with Laplacian kernel (top) and Riesz kernel (bottom) for different hyperparameters.
  • ...and 5 more figures

Theorems & Definitions (17)

  • Theorem 1: Sliced Riesz Kernels are Riesz Kernels
  • Theorem 2: Relation between $\mathcal{D}_K$ and $\mathcal{W}_1$ for Distance Kernels
  • Theorem 3: Derivatives of Interaction and Potential Energy
  • Theorem 4: Error Bound for Stochastic MMD Gradients
  • Remark 5: Computational Complexity of Gradient Evaluations
  • Remark 6: Iterative Training and Sampling
  • Remark 7: Extension to $\mathcal{P}_{\frac{r}{2}}(\mathbb{R}^d)$
  • Lemma 8
  • Lemma 9
  • proof
  • ...and 7 more