Generative Modeling via Drifting

Mingyang Deng; He Li; Tianhong Li; Yilun Du; Kaiming He

Generative Modeling via Drifting

Mingyang Deng, He Li, Tianhong Li, Yilun Du, Kaiming He

TL;DR

Drifting Models introduce a training-time evolution of the pushforward distribution $q = f_\theta{}_{\#} p_{\boldsymbol{\epsilon}}$ via a drifting field $\mathbf{V}_{p,q}$ that vanishes at equilibrium when $q = p_{\text{data}}$, enabling a single-pass, one-step generator. The method employs a kernelized attraction-to-data and repulsion-from-generated-samples drift, with a fixed-point training objective and stop-gradient targets to align $q$ with $p_{\text{data}}$. It extends drifting to feature space, supports multi-scale representations, and can incorporate classifier-free guidance by conditioning on class and unconditional data. Empirically, it achieves state-of-the-art 1-NFE FID scores on ImageNet 256×256 in both latent ($\mathrm{FID}=1.54$) and pixel space ($\mathrm{FID}=1.61$), and demonstrates strong performance in latent and pixel-space generation as well as robotics control, illustrating a practical, diffusion-free paradigm for high-quality, efficient generation.

Abstract

Generative modeling can be formulated as learning a mapping f such that its pushforward distribution matches the data distribution. The pushforward behavior can be carried out iteratively at inference time, for example in diffusion and flow-based models. In this paper, we propose a new paradigm called Drifting Models, which evolve the pushforward distribution during training and naturally admit one-step inference. We introduce a drifting field that governs the sample movement and achieves equilibrium when the distributions match. This leads to a training objective that allows the neural network optimizer to evolve the distribution. In experiments, our one-step generator achieves state-of-the-art results on ImageNet at 256 x 256 resolution, with an FID of 1.54 in latent space and 1.61 in pixel space. We hope that our work opens up new opportunities for high-quality one-step generation.

Generative Modeling via Drifting

TL;DR

Drifting Models introduce a training-time evolution of the pushforward distribution

via a drifting field

that vanishes at equilibrium when

, enabling a single-pass, one-step generator. The method employs a kernelized attraction-to-data and repulsion-from-generated-samples drift, with a fixed-point training objective and stop-gradient targets to align

with

. It extends drifting to feature space, supports multi-scale representations, and can incorporate classifier-free guidance by conditioning on class and unconditional data. Empirically, it achieves state-of-the-art 1-NFE FID scores on ImageNet 256×256 in both latent (

) and pixel space (

), and demonstrates strong performance in latent and pixel-space generation as well as robotics control, illustrating a practical, diffusion-free paradigm for high-quality, efficient generation.

Abstract

Paper Structure (82 sections, 1 theorem, 53 equations, 15 figures, 11 tables, 2 algorithms)

This paper contains 82 sections, 1 theorem, 53 equations, 15 figures, 11 tables, 2 algorithms.

Introduction
Related Work
Diffusion-/Flow-based Models.
Generative Adversarial Networks (GANs).
Variational Autoencoders (VAEs).
Normalizing Flows (NFs).
Moment Matching.
Contrastive Learning.
Drifting Models for Generation
Pushforward at Training Time
Drifting Field for Training
Training Objective.
Designing the Drifting Field
Kernel.
Equilibrium and Matched Distributions.
...and 67 more sections

Key Result

Proposition 3.1

Consider an anti-symmetric drifting field: Then we have: $\quad q=p \quad \Rightarrow \quad \mathbf{V}_{p,q}(\mathbf{x}) = \mathbf{0},\forall \mathbf{x}$.

Figures (15)

Figure 1: Drifting Model. A network $f$ performs a pushforward operation: $q={f}_\# p_{\text{prior}}$, mapping a prior distribution $p_{\text{prior}}$ (e.g., Gaussian, not shown here) to a pushforward distribution $q$ (orange). The goal of training is to approximate the data distribution $p_{\text{data}}$ (blue). As training iterates, we obtain a sequence of models $\{f_i\}$, which corresponds to a sequence of pushforward distributions $\{q_i\}$. Our Drifting Model focuses on the evolution of this pushforward distribution at training-time. We introduce a drifting field (detailed in main text) that approaches zero when $q$ matches $p_{\text{data}}$. This drifting field provides a loss function (y-axis, in log-scale) for training.
Figure 2: Illustration of drifting a sample. A generated sample $\mathbf{x}$ (black) drifts according to a vector: $\mathbf{V}=\mathbf{V}^+_{p}-\mathbf{V}^-_{q}$. Here, $\mathbf{V}^+_{p}$ is the mean-shift vector of the positive samples (blue) and $\mathbf{V}^-_{q}$ is the mean-shift vector of the negative samples (orange): see Eq. (\ref{['eq:meanshift']}). $\mathbf{x}$ is attracted by $\mathbf{V}^+_{p}$ and repulsed by $\mathbf{V}^-_{q}$.
Figure 3: Evolution of the generated distribution. The distribution $q$ (orange) evolves toward a bimodal target $p$ (blue) during training. We show three initializations of $q$: (top): initialized between the two modes; (middle): initialized far from both modes; (bottom): initialized collapsed onto one mode. Across all initializations, our method approximates the target distribution without mode collapse.
Figure 4: Evolution of samples. We show generated points sampled at different training iterations, along with their loss values. The loss (whose value equals $\|V\|^2$) decreases as the distribution converges to the target. (y-axis is log-scale.)
Figure 5: Effect of CFG scale $\alpha$.(a): FID vs. $\alpha$. (b): IS vs. $\alpha$. (c): IS vs. FID. We show the L/2 (solid) and B/2 (dashed) models. Consistent with common observations in diffusion-/flow-based models, the CFG scale effectively trades off distributional coverage (as reflected by FID) against per-image quality (measured by IS). Notably, with the L/2 model, the optimal FID is achieved at $\alpha{=}1.0$, which is often regarded as "w/o CFG" in diffusion-/flow-based models. For B/2, the optimal FID is achieved at $\alpha{=}1.1$.
...and 10 more figures

Theorems & Definitions (1)

Proposition 3.1

Generative Modeling via Drifting

TL;DR

Abstract

Generative Modeling via Drifting

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (1)