Deep MMD Gradient Flow without adversarial training

Alexandre Galashov; Valentin de Bortoli; Arthur Gretton

Deep MMD Gradient Flow without adversarial training

Alexandre Galashov, Valentin de Bortoli, Arthur Gretton

TL;DR

This work introduces Diffusion-MMD Gradient Flow (DMMD), a non-adversarial generative framework that trains a noise-conditioned MMD discriminator along a forward diffusion path and uses a corresponding Wasserstein gradient flow to generate samples. By adapting the kernel through noise levels and leveraging a diffusion-inspired training curriculum, DMMD achieves competitive unconditional image generation on CIFAR-10, MNIST, CelebA-64, and LSUN-Church-64, while avoiding adversarial training altogether. The authors establish theoretical justification for adaptive kernels and provide practical scalability via a linear base kernel, along with an approximate sampling variant and a KALE-flow extension. Empirically, DMMD outperforms several discriminator-flow baselines and demonstrates the viability of discriminative flows as a robust alternative to GANs and diffusion models for high-dimensional generation. The approach offers a principled, non-adversarial path to controlled particle transport toward a target distribution with potential applicability to larger datasets and diffusion-model contexts.”

Abstract

We propose a gradient flow procedure for generative modeling by transporting particles from an initial source distribution to a target distribution, where the gradient field on the particles is given by a noise-adaptive Wasserstein Gradient of the Maximum Mean Discrepancy (MMD). The noise-adaptive MMD is trained on data distributions corrupted by increasing levels of noise, obtained via a forward diffusion process, as commonly used in denoising diffusion probabilistic models. The result is a generalization of MMD Gradient Flow, which we call Diffusion-MMD-Gradient Flow or DMMD. The divergence training procedure is related to discriminator training in Generative Adversarial Networks (GAN), but does not require adversarial training. We obtain competitive empirical performance in unconditional image generation on CIFAR10, MNIST, CELEB-A (64 x64) and LSUN Church (64 x 64). Furthermore, we demonstrate the validity of the approach when MMD is replaced by a lower bound on the KL divergence.

Deep MMD Gradient Flow without adversarial training

TL;DR

Abstract

Paper Structure (44 sections, 7 theorems, 90 equations, 8 figures, 5 tables, 4 algorithms)

This paper contains 44 sections, 7 theorems, 90 equations, 8 figures, 5 tables, 4 algorithms.

Introduction
Background
MMD GAN.
Wasserstein gradient flows.
MMD gradient flow.
A motivation for adaptive kernels
Diffusion Maximum Mean Discrepancy Gradient Flow
Adversarial-free training of noise conditional discriminators
Adaptive gradient flow sampling
Final denoising.
Scalable $\mathrm{DMMD}$ with linear kernel
Approximate sampling procedure.
f-divergences
Related Work
Adversarial training and $\mathrm{MMD}$-GAN.
...and 29 more sections

Key Result

Proposition 3.1

For any $\mu_0 \in \mathbb{R}^d$ and $\sigma >0$, let $\alpha^\star$ be given by Then, we have that

Figures (8)

Figure 1: Samples from $\mathrm{MMD}$ Gradient flow with different parameters for the RBF kernel \ref{['eq:noise_conditional_rbf_kernel']}.
Figure 2: Qualitative behaviour of $\mathrm{MMD}$ discriminators. Left, learned RBF kernel \ref{['eq:noise_conditional_rbf_kernel']} widths $\sigma(t)$ as a function of noise level $t$. Center, parameter $\sigma$ for $\mathrm{MMD}$-GAN as function of training iteration. Right, $\mathrm{MMD}^2(P_t, P; t)$ for different methods.
Figure 3: Evolution of the norm of the mean $\mu_t$ of the Gaussian distribution $\pi_{\mu_t, \sigma}$ according to a gradient flow on the mean $\mu_t$ w.r.t. $\mathrm{MMD}_{\alpha_t}$. In the adaptive case $\alpha_t$ is given by \ref{['prop:optimal_kernel_gradient']} while in the non adaptive case, $\alpha_t = \alpha_0 = 1$. In our experiment we consider $d=1$ and $\sigma =1$, for illustration purposes.
Figure 4: CIFAR-10 samples from $\mathrm{DMMD}$ with NFE=250 on the left and with NFE=100 on the right
Figure 5: CIFAR-10 samples from $\mathrm{DMMD}$ with NFE=100 on the left and samples from the $a$-$\mathrm{DMMD}$-$e$ with NFE=50 on the right
...and 3 more figures

Theorems & Definitions (11)

Proposition 3.1
Proposition 6.1
Proposition 6.2
proof
Proposition 6.3
proof
Proposition 6.4
proof
Proposition 6.5
Proposition 6.6
...and 1 more

Deep MMD Gradient Flow without adversarial training

TL;DR

Abstract

Deep MMD Gradient Flow without adversarial training

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (11)