Table of Contents
Fetching ...

The Generation Phases of Flow Matching: a Denoising Perspective

Anne Gagneux, Ségolène Martin, Rémi Gribonval, Mathurin Massias

TL;DR

Flow matching can be understood through a denoising lens, enabling principled comparisons with denoisers at every noise level. The authors construct a denoising toolkit that maps between denoisers and velocities, and they design drift and noise perturbations to probe generation dynamics. They show that different denoising losses and parametrizations, although theoretically equivalent under perfect optimization, produce distinct generation and denoising behaviors and reveal temporally distinct phases: early drift-sensitive and late noise-sensitive. The findings highlight intermediate-time importance for generation quality and offer a principled path to customize generation dynamics via controlled training and perturbations.

Abstract

Flow matching has achieved remarkable success, yet the factors influencing the quality of its generation process remain poorly understood. In this work, we adopt a denoising perspective and design a framework to empirically probe the generation process. Laying down the formal connections between flow matching models and denoisers, we provide a common ground to compare their performances on generation and denoising. This enables the design of principled and controlled perturbations to influence sample generation: noise and drift. This leads to new insights on the distinct dynamical phases of the generative process, enabling us to precisely characterize at which stage of the generative process denoisers succeed or fail and why this matters.

The Generation Phases of Flow Matching: a Denoising Perspective

TL;DR

Flow matching can be understood through a denoising lens, enabling principled comparisons with denoisers at every noise level. The authors construct a denoising toolkit that maps between denoisers and velocities, and they design drift and noise perturbations to probe generation dynamics. They show that different denoising losses and parametrizations, although theoretically equivalent under perfect optimization, produce distinct generation and denoising behaviors and reveal temporally distinct phases: early drift-sensitive and late noise-sensitive. The findings highlight intermediate-time importance for generation quality and offer a principled path to customize generation dynamics via controlled training and perturbations.

Abstract

Flow matching has achieved remarkable success, yet the factors influencing the quality of its generation process remain poorly understood. In this work, we adopt a denoising perspective and design a framework to empirically probe the generation process. Laying down the formal connections between flow matching models and denoisers, we provide a common ground to compare their performances on generation and denoising. This enables the design of principled and controlled perturbations to influence sample generation: noise and drift. This leads to new insights on the distinct dynamical phases of the generative process, enabling us to precisely characterize at which stage of the generative process denoisers succeed or fail and why this matters.

Paper Structure

This paper contains 46 sections, 32 equations, 27 figures, 4 tables.

Figures (27)

  • Figure 1: Equivalence between velocity $v_t$ and denoiser $D_t$. Learning the optimal velocity amounts to learning an optimal denoiser at every time $t$.
  • Figure 2: PSNR and FID for the different losses and parametrizations, CIFAR-10. Models that reach the highest PSNR (low difference in PSNR compared to standard FM) also reach the lowest FID.
  • Figure 3: Inpainting results in terms of PSNR (higher is better) as a function of the time in PnP-Flow, CelebA-64. Results are averaged over 100 images. Mask of size $17\times 17$. The horizontal black line represents as a reference the PSNR of the degraded image.
  • Figure 4: Influence of different perturbations at different generation phases on the FID (10K, test) on CelebA 128 . Two classes of perturbations emerge: high-frequency perturbations (ckb. 4x4, 8x8) characterized by high FID, low pairwise distance and strongest impact in the last times and low-frequency perturbations (pos./neg. shift, ckb. 16x16-64x64) characterized by low FID, high pairwise distance and strongest impact in the early times.
  • Figure 5: Effect of low- and high-frequency perturbations at early and late times for CelebA 128.
  • ...and 22 more figures

Theorems & Definitions (3)

  • Remark 1: Equivalence of generative and classical denoisers
  • Remark 2
  • Remark 3