Gradient Flow Drifting: Generative Modeling via Wasserstein Gradient Flows of KDE-Approximated Divergences

Jiarui Cao; Zixuan Wei; Yuxin Liu

Gradient Flow Drifting: Generative Modeling via Wasserstein Gradient Flows of KDE-Approximated Divergences

Jiarui Cao, Zixuan Wei, Yuxin Liu

TL;DR

It is proved that the drifting field of drifting model equals, up to a bandwidth-squared scaling factor, the difference of KDE log-density gradients, which is exactly the particle velocity field of the Wasserstein-2 gradient flow of $KL(q\|p)$ with KDE-approximated densities.

Abstract

We reveal a precise mathematical framework about a new family of generative models which we call Gradient Flow Drifting. With this framework, we prove an equivalence between the recently proposed Drifting Model and the Wasserstein gradient flow of the forward KL divergence under kernel density estimation (KDE) approximation. Specifically, we prove that the drifting field of drifting model (arXiv:2602.04770) equals, up to a bandwidth-squared scaling factor, the difference of KDE log-density gradients $\nabla \log p_{\mathrm{kde}} - \nabla \log q_{\mathrm{kde}}$, which is exactly the particle velocity field of the Wasserstein-2 gradient flow of $KL(q\|p)$ with KDE-approximated densities. Besides that, this broad family of generative models can also include MMD-based generators, which arises as special cases of Wasserstein gradient flows of different divergences under KDE approximation. We provide a concise identifiability proof, and a theoretically grounded mixed-divergence strategy. We combine reverse KL and $χ^2$ divergence gradient flows to simultaneously avoid mode collapse and mode blurring, and extend this method onto Riemannian manifold which loosens the constraints on the kernel function, and makes this method more suitable for the semantic space. Preliminary experiments on synthetic benchmarks validate the framework.

Gradient Flow Drifting: Generative Modeling via Wasserstein Gradient Flows of KDE-Approximated Divergences

TL;DR

with KDE-approximated densities.

Abstract

, which is exactly the particle velocity field of the Wasserstein-2 gradient flow of

with KDE-approximated densities. Besides that, this broad family of generative models can also include MMD-based generators, which arises as special cases of Wasserstein gradient flows of different divergences under KDE approximation. We provide a concise identifiability proof, and a theoretically grounded mixed-divergence strategy. We combine reverse KL and

divergence gradient flows to simultaneously avoid mode collapse and mode blurring, and extend this method onto Riemannian manifold which loosens the constraints on the kernel function, and makes this method more suitable for the semantic space. Preliminary experiments on synthetic benchmarks validate the framework.

Paper Structure (50 sections, 24 theorems, 38 equations, 1 figure, 3 tables, 1 algorithm)

This paper contains 50 sections, 24 theorems, 38 equations, 1 figure, 3 tables, 1 algorithm.

Introduction
Our key observation.
Related Work
Drifting Models.
Wasserstein gradient flows in generative modeling.
Kernel density estimation and score estimation.
MMD and kernel methods for generation.
$f$-divergence minimization.
Preliminaries
Notation
Kernel Density Estimation
Reproducing Kernel Hilbert Spaces
Wasserstein Gradient Flows
The Drifting Model
Method: Gradient Flow Drifting
...and 35 more sections

Key Result

Theorem 4.2

Under K2--K4, for any $\mu \in \mathcal{P}(\mathbb{R}^d)$: (i) $\mu_\mathrm{kde} \in C^1(\mathbb{R}^d)$ with $\nabla_\mathbf{x} \mu_\mathrm{kde}(\mathbf{x}) = \int \nabla_\mathbf{x} k(\mathbf{x},\mathbf{y})\mathrm{d}\mu(\mathbf{y})$; (ii) $\mu_\mathrm{kde}(\mathbf{x}) > 0$ for all $\mathbf{x}$; (iii

Figures (1)

Figure 1: Training results with the velocity field of gradient flow under different implementations of divergence and kernel function on 2D-toy dataset.

Theorems & Definitions (53)

Definition 3.1: KDE operator
Definition 3.2: RKHS and kernel mean embedding
Definition 3.3: Characteristic kernel
Definition 3.4: Wasserstein-2 gradient flow
Theorem 4.2: KDE regularity; proof in Appendix \ref{['app:regularity']}
Proposition 4.3: KDE injectivity; proof in Appendix \ref{['app:injectivity']}
Remark 4.4: Foundation summary
Theorem 4.5: Energy dissipation
Remark 4.6: Factored velocity structure
Theorem 4.7: Unified identifiability (Proof in Appendix \ref{['app:proof-ident']})
...and 43 more

Gradient Flow Drifting: Generative Modeling via Wasserstein Gradient Flows of KDE-Approximated Divergences

TL;DR

Abstract

Gradient Flow Drifting: Generative Modeling via Wasserstein Gradient Flows of KDE-Approximated Divergences

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (53)