Table of Contents
Fetching ...

Flow Map Distillation Without Data

Shangyuan Tong, Nanye Ma, Saining Xie, Tommi Jaakkola

TL;DR

This work challenges the conventional data-dependent paradigm for flow-map distillation by identifying Teacher-Data Mismatch as a fundamental risk. It introduces FreeFlow, a completely data-free predictor-corrector framework that distills flow maps from a pre-trained teacher using only samples from the prior ${\pi}$. The method aligns the learner's generating velocity with the teacher's velocity and applies a correction step to align marginal noising distributions, achieving state-of-the-art FID on ImageNet with 1-NFE and enabling efficient inference-time scaling. The results demonstrate that a data-free approach can match or exceed data-based methods while avoiding dataset-induced biases, with practical benefits for accelerating large generative models. This work advocates a robust, data-free paradigm for flow-map distillation and broadens the practical toolkit for fast, high-fidelity generative modeling.

Abstract

State-of-the-art flow models achieve remarkable quality but require slow, iterative sampling. To accelerate this, flow maps can be distilled from pre-trained teachers, a procedure that conventionally requires sampling from an external dataset. We argue that this data-dependency introduces a fundamental risk of Teacher-Data Mismatch, as a static dataset may provide an incomplete or even misaligned representation of the teacher's full generative capabilities. This leads us to question whether this reliance on data is truly necessary for successful flow map distillation. In this work, we explore a data-free alternative that samples only from the prior distribution, a distribution the teacher is guaranteed to follow by construction, thereby circumventing the mismatch risk entirely. To demonstrate the practical viability of this philosophy, we introduce a principled framework that learns to predict the teacher's sampling path while actively correcting for its own compounding errors to ensure high fidelity. Our approach surpasses all data-based counterparts and establishes a new state-of-the-art by a significant margin. Specifically, distilling from SiT-XL/2+REPA, our method reaches an impressive FID of 1.45 on ImageNet 256x256, and 1.49 on ImageNet 512x512, both with only 1 sampling step. We hope our work establishes a more robust paradigm for accelerating generative models and motivates the broader adoption of flow map distillation without data.

Flow Map Distillation Without Data

TL;DR

This work challenges the conventional data-dependent paradigm for flow-map distillation by identifying Teacher-Data Mismatch as a fundamental risk. It introduces FreeFlow, a completely data-free predictor-corrector framework that distills flow maps from a pre-trained teacher using only samples from the prior . The method aligns the learner's generating velocity with the teacher's velocity and applies a correction step to align marginal noising distributions, achieving state-of-the-art FID on ImageNet with 1-NFE and enabling efficient inference-time scaling. The results demonstrate that a data-free approach can match or exceed data-based methods while avoiding dataset-induced biases, with practical benefits for accelerating large generative models. This work advocates a robust, data-free paradigm for flow-map distillation and broadens the practical toolkit for fast, high-fidelity generative modeling.

Abstract

State-of-the-art flow models achieve remarkable quality but require slow, iterative sampling. To accelerate this, flow maps can be distilled from pre-trained teachers, a procedure that conventionally requires sampling from an external dataset. We argue that this data-dependency introduces a fundamental risk of Teacher-Data Mismatch, as a static dataset may provide an incomplete or even misaligned representation of the teacher's full generative capabilities. This leads us to question whether this reliance on data is truly necessary for successful flow map distillation. In this work, we explore a data-free alternative that samples only from the prior distribution, a distribution the teacher is guaranteed to follow by construction, thereby circumventing the mismatch risk entirely. To demonstrate the practical viability of this philosophy, we introduce a principled framework that learns to predict the teacher's sampling path while actively correcting for its own compounding errors to ensure high fidelity. Our approach surpasses all data-based counterparts and establishes a new state-of-the-art by a significant margin. Specifically, distilling from SiT-XL/2+REPA, our method reaches an impressive FID of 1.45 on ImageNet 256x256, and 1.49 on ImageNet 512x512, both with only 1 sampling step. We hope our work establishes a more robust paradigm for accelerating generative models and motivates the broader adoption of flow map distillation without data.

Paper Structure

This paper contains 42 sections, 1 theorem, 24 equations, 18 figures, 6 tables.

Key Result

Proposition A.1

Let ${\bm{\phi}}_{\bm{u}}({\bm{x}}_t,t,s)$ be defined as in eq:solution, and we assume that there exists some $L>0$ for all ${\bm{y}}, {\bm{y}}'\in\mathbb{R}^d$ and $r\in[0,1]$, $\|{\bm{u}}({\bm{y}},r)-{\bm{u}}({\bm{y}}',r)\|\leq L\|{\bm{y}}-{\bm{y}}'\|$. We further assume that ${\bm{f}}_\theta({\bm

Figures (18)

  • Figure 1: Teacher-Data Mismatch and the data-free alternative. (Top) Conventional data-based distillation relies on intermediate distributions ($\tilde{p}_t$) derived from a static dataset, which could be misaligned with the teacher's true generative distributions ($\hat{p}_t$). (Bottom) The data-free paradigm, in contrast, samples only from the prior ($\pi$), the single distribution with guaranteed alignment, thereby circumventing the mismatch risk by construction.
  • Figure 2: Impact of Teacher-Data Mismatch. With a fixed teacher model, increasing augmentation induces a more severe mismatch between teacher and data, degrading student performance.
  • Figure 3: Selected samples from FreeFlow-XL/2 model at 512${\times}$512 resolution with 1-NFE. More uncurated results are in \ref{['sec:visual']}.
  • Figure 4: Approximation errors accumulate as the prediction proceeds from noise to data.
  • Figure 5: Correction objective in \ref{['eq:c_grad']} aligns the student's noising velocity ${{\bm{v}}_\text{N}}$ with ${\bm{u}}$.
  • ...and 13 more figures

Theorems & Definitions (2)

  • Proposition A.1
  • proof