Table of Contents
Fetching ...

Gradient Flow Drifting: Generative Modeling via Wasserstein Gradient Flows of KDE-Approximated Divergences

Jiarui Cao, Zixuan Wei, Yuxin Liu

TL;DR

It is proved that the drifting field of drifting model equals, up to a bandwidth-squared scaling factor, the difference of KDE log-density gradients, which is exactly the particle velocity field of the Wasserstein-2 gradient flow of $KL(q\|p)$ with KDE-approximated densities.

Abstract

We reveal a precise mathematical framework about a new family of generative models which we call Gradient Flow Drifting. With this framework, we prove an equivalence between the recently proposed Drifting Model and the Wasserstein gradient flow of the forward KL divergence under kernel density estimation (KDE) approximation. Specifically, we prove that the drifting field of drifting model (arXiv:2602.04770) equals, up to a bandwidth-squared scaling factor, the difference of KDE log-density gradients $\nabla \log p_{\mathrm{kde}} - \nabla \log q_{\mathrm{kde}}$, which is exactly the particle velocity field of the Wasserstein-2 gradient flow of $KL(q\|p)$ with KDE-approximated densities. Besides that, this broad family of generative models can also include MMD-based generators, which arises as special cases of Wasserstein gradient flows of different divergences under KDE approximation. We provide a concise identifiability proof, and a theoretically grounded mixed-divergence strategy. We combine reverse KL and $χ^2$ divergence gradient flows to simultaneously avoid mode collapse and mode blurring, and extend this method onto Riemannian manifold which loosens the constraints on the kernel function, and makes this method more suitable for the semantic space. Preliminary experiments on synthetic benchmarks validate the framework.

Gradient Flow Drifting: Generative Modeling via Wasserstein Gradient Flows of KDE-Approximated Divergences

TL;DR

It is proved that the drifting field of drifting model equals, up to a bandwidth-squared scaling factor, the difference of KDE log-density gradients, which is exactly the particle velocity field of the Wasserstein-2 gradient flow of with KDE-approximated densities.

Abstract

We reveal a precise mathematical framework about a new family of generative models which we call Gradient Flow Drifting. With this framework, we prove an equivalence between the recently proposed Drifting Model and the Wasserstein gradient flow of the forward KL divergence under kernel density estimation (KDE) approximation. Specifically, we prove that the drifting field of drifting model (arXiv:2602.04770) equals, up to a bandwidth-squared scaling factor, the difference of KDE log-density gradients , which is exactly the particle velocity field of the Wasserstein-2 gradient flow of with KDE-approximated densities. Besides that, this broad family of generative models can also include MMD-based generators, which arises as special cases of Wasserstein gradient flows of different divergences under KDE approximation. We provide a concise identifiability proof, and a theoretically grounded mixed-divergence strategy. We combine reverse KL and divergence gradient flows to simultaneously avoid mode collapse and mode blurring, and extend this method onto Riemannian manifold which loosens the constraints on the kernel function, and makes this method more suitable for the semantic space. Preliminary experiments on synthetic benchmarks validate the framework.
Paper Structure (50 sections, 24 theorems, 38 equations, 1 figure, 3 tables, 1 algorithm)

This paper contains 50 sections, 24 theorems, 38 equations, 1 figure, 3 tables, 1 algorithm.

Key Result

Theorem 4.2

Under K2--K4, for any $\mu \in \mathcal{P}(\mathbb{R}^d)$: (i) $\mu_\mathrm{kde} \in C^1(\mathbb{R}^d)$ with $\nabla_\mathbf{x} \mu_\mathrm{kde}(\mathbf{x}) = \int \nabla_\mathbf{x} k(\mathbf{x},\mathbf{y})\mathrm{d}\mu(\mathbf{y})$; (ii) $\mu_\mathrm{kde}(\mathbf{x}) > 0$ for all $\mathbf{x}$; (iii

Figures (1)

  • Figure 1: Training results with the velocity field of gradient flow under different implementations of divergence and kernel function on 2D-toy dataset.

Theorems & Definitions (53)

  • Definition 3.1: KDE operator
  • Definition 3.2: RKHS and kernel mean embedding
  • Definition 3.3: Characteristic kernel
  • Definition 3.4: Wasserstein-2 gradient flow
  • Theorem 4.2: KDE regularity; proof in Appendix \ref{['app:regularity']}
  • Proposition 4.3: KDE injectivity; proof in Appendix \ref{['app:injectivity']}
  • Remark 4.4: Foundation summary
  • Theorem 4.5: Energy dissipation
  • Remark 4.6: Factored velocity structure
  • Theorem 4.7: Unified identifiability (Proof in Appendix \ref{['app:proof-ident']})
  • ...and 43 more