Deep MMD Gradient Flow without adversarial training
Alexandre Galashov, Valentin de Bortoli, Arthur Gretton
TL;DR
This work introduces Diffusion-MMD Gradient Flow (DMMD), a non-adversarial generative framework that trains a noise-conditioned MMD discriminator along a forward diffusion path and uses a corresponding Wasserstein gradient flow to generate samples. By adapting the kernel through noise levels and leveraging a diffusion-inspired training curriculum, DMMD achieves competitive unconditional image generation on CIFAR-10, MNIST, CelebA-64, and LSUN-Church-64, while avoiding adversarial training altogether. The authors establish theoretical justification for adaptive kernels and provide practical scalability via a linear base kernel, along with an approximate sampling variant and a KALE-flow extension. Empirically, DMMD outperforms several discriminator-flow baselines and demonstrates the viability of discriminative flows as a robust alternative to GANs and diffusion models for high-dimensional generation. The approach offers a principled, non-adversarial path to controlled particle transport toward a target distribution with potential applicability to larger datasets and diffusion-model contexts.”
Abstract
We propose a gradient flow procedure for generative modeling by transporting particles from an initial source distribution to a target distribution, where the gradient field on the particles is given by a noise-adaptive Wasserstein Gradient of the Maximum Mean Discrepancy (MMD). The noise-adaptive MMD is trained on data distributions corrupted by increasing levels of noise, obtained via a forward diffusion process, as commonly used in denoising diffusion probabilistic models. The result is a generalization of MMD Gradient Flow, which we call Diffusion-MMD-Gradient Flow or DMMD. The divergence training procedure is related to discriminator training in Generative Adversarial Networks (GAN), but does not require adversarial training. We obtain competitive empirical performance in unconditional image generation on CIFAR10, MNIST, CELEB-A (64 x64) and LSUN Church (64 x 64). Furthermore, we demonstrate the validity of the approach when MMD is replaced by a lower bound on the KL divergence.
