Path-Guided Flow Matching for Dataset Distillation

Xuhui Li; Zhengquan Luo; Xiwei Liu; Yongqiang Yu; Zhiqiang Xu

Path-Guided Flow Matching for Dataset Distillation

Xuhui Li, Zhengquan Luo, Xiwei Liu, Yongqiang Yu, Zhiqiang Xu

TL;DR

Dataset distillation seeks to compress large datasets into small, representative sets without sacrificing performance. This work introduces Path-Guided Flow Matching (PGFM), the first flow-matching-based framework for generative dataset distillation, operating in a frozen VAE latent space and solving an ODE with few steps. PGFM adds lightweight prototype guidance to steer trajectories toward diverse class prototypes while employing warm-start and trust-region constraints to preserve detail, achieving strong performance with dramatically reduced computation. Across high-resolution benchmarks, PGFM matches or surpasses diffusion-based methods with significantly higher efficiency and improved mode coverage, illustrating flow matching as a practical alternative for scalable dataset distillation.

Abstract

Dataset distillation compresses large datasets into compact synthetic sets with comparable performance in training models. Despite recent progress on diffusion-based distillation, this type of method typically depends on heuristic guidance or prototype assignment, which comes with time-consuming sampling and trajectory instability and thus hurts downstream generalization especially under strong control or low IPC. We propose \emph{Path-Guided Flow Matching (PGFM)}, the first flow matching-based framework for generative distillation, which enables fast deterministic synthesis by solving an ODE in a few steps. PGFM conducts flow matching in the latent space of a frozen VAE to learn class-conditional transport from Gaussian noise to data distribution. Particularly, we develop a continuous path-to-prototype guidance algorithm for ODE-consistent path control, which allows trajectories to reliably land on assigned prototypes while preserving diversity and efficiency. Extensive experiments across high-resolution benchmarks demonstrate that PGFM matches or surpasses prior diffusion-based distillation approaches with fewer steps of sampling while delivering competitive performance with remarkably improved efficiency, e.g., 7.6$\times$ more efficient than the diffusion-based counterparts with 78\% mode coverage.

Path-Guided Flow Matching for Dataset Distillation

TL;DR

Abstract

more efficient than the diffusion-based counterparts with 78\% mode coverage.

Paper Structure (36 sections, 17 equations, 8 figures, 11 tables, 1 algorithm)

This paper contains 36 sections, 17 equations, 8 figures, 11 tables, 1 algorithm.

Introduction
Related Work
Preliminaries
Dataset Distillation
Latent Space Generative Modeling
Conditional Flow Matching (CFM) and ODE Sampling
Proposed Method: PGFM
Latent Space Preprocessing
Flow-Matching Sampling
Path-guided control (PGFM).
Warm start.
Latent Selection for Decoding
Experiments
Experimental Setup
Result Analysis
...and 21 more sections

Figures (8)

Figure 1: Motivation: Diffusion sampling denoises step-by-step, while flow matching uses deterministic ODE sampling that is already strong and smooth; PGFM adds lightweight prototype guidance to further improve performance.
Figure 2: Sampling process of PGFM: Starting from Gaussian noise, we then sample with a pretrained flow-matching generator (GMFlow) while applying lightweight, early-stage prototype guidance with a trust region to improve mode coverage without washing out details.
Figure 3: t-SNE Analysis. Visual comparison of latent distributions. (a) Generated by FM baseline. (b) Generated by MGD$^3$. (c) Generated by our PGFM. ( Synthetic image, Real image, Prototypes). The PGFM method demonstrates better alignment with the real latent distribution.
Figure 4: Representativeness and coverage analysis
Figure 5: Blurring synthetic images on ImageNette for $\text{IPC}=10$
...and 3 more figures

Path-Guided Flow Matching for Dataset Distillation

TL;DR

Abstract

Path-Guided Flow Matching for Dataset Distillation

Authors

TL;DR

Abstract

Table of Contents

Figures (8)