Table of Contents
Fetching ...

Path-Guided Flow Matching for Dataset Distillation

Xuhui Li, Zhengquan Luo, Xiwei Liu, Yongqiang Yu, Zhiqiang Xu

TL;DR

Dataset distillation seeks to compress large datasets into small, representative sets without sacrificing performance. This work introduces Path-Guided Flow Matching (PGFM), the first flow-matching-based framework for generative dataset distillation, operating in a frozen VAE latent space and solving an ODE with few steps. PGFM adds lightweight prototype guidance to steer trajectories toward diverse class prototypes while employing warm-start and trust-region constraints to preserve detail, achieving strong performance with dramatically reduced computation. Across high-resolution benchmarks, PGFM matches or surpasses diffusion-based methods with significantly higher efficiency and improved mode coverage, illustrating flow matching as a practical alternative for scalable dataset distillation.

Abstract

Dataset distillation compresses large datasets into compact synthetic sets with comparable performance in training models. Despite recent progress on diffusion-based distillation, this type of method typically depends on heuristic guidance or prototype assignment, which comes with time-consuming sampling and trajectory instability and thus hurts downstream generalization especially under strong control or low IPC. We propose \emph{Path-Guided Flow Matching (PGFM)}, the first flow matching-based framework for generative distillation, which enables fast deterministic synthesis by solving an ODE in a few steps. PGFM conducts flow matching in the latent space of a frozen VAE to learn class-conditional transport from Gaussian noise to data distribution. Particularly, we develop a continuous path-to-prototype guidance algorithm for ODE-consistent path control, which allows trajectories to reliably land on assigned prototypes while preserving diversity and efficiency. Extensive experiments across high-resolution benchmarks demonstrate that PGFM matches or surpasses prior diffusion-based distillation approaches with fewer steps of sampling while delivering competitive performance with remarkably improved efficiency, e.g., 7.6$\times$ more efficient than the diffusion-based counterparts with 78\% mode coverage.

Path-Guided Flow Matching for Dataset Distillation

TL;DR

Dataset distillation seeks to compress large datasets into small, representative sets without sacrificing performance. This work introduces Path-Guided Flow Matching (PGFM), the first flow-matching-based framework for generative dataset distillation, operating in a frozen VAE latent space and solving an ODE with few steps. PGFM adds lightweight prototype guidance to steer trajectories toward diverse class prototypes while employing warm-start and trust-region constraints to preserve detail, achieving strong performance with dramatically reduced computation. Across high-resolution benchmarks, PGFM matches or surpasses diffusion-based methods with significantly higher efficiency and improved mode coverage, illustrating flow matching as a practical alternative for scalable dataset distillation.

Abstract

Dataset distillation compresses large datasets into compact synthetic sets with comparable performance in training models. Despite recent progress on diffusion-based distillation, this type of method typically depends on heuristic guidance or prototype assignment, which comes with time-consuming sampling and trajectory instability and thus hurts downstream generalization especially under strong control or low IPC. We propose \emph{Path-Guided Flow Matching (PGFM)}, the first flow matching-based framework for generative distillation, which enables fast deterministic synthesis by solving an ODE in a few steps. PGFM conducts flow matching in the latent space of a frozen VAE to learn class-conditional transport from Gaussian noise to data distribution. Particularly, we develop a continuous path-to-prototype guidance algorithm for ODE-consistent path control, which allows trajectories to reliably land on assigned prototypes while preserving diversity and efficiency. Extensive experiments across high-resolution benchmarks demonstrate that PGFM matches or surpasses prior diffusion-based distillation approaches with fewer steps of sampling while delivering competitive performance with remarkably improved efficiency, e.g., 7.6 more efficient than the diffusion-based counterparts with 78\% mode coverage.
Paper Structure (36 sections, 17 equations, 8 figures, 11 tables, 1 algorithm)

This paper contains 36 sections, 17 equations, 8 figures, 11 tables, 1 algorithm.

Figures (8)

  • Figure 1: Motivation: Diffusion sampling denoises step-by-step, while flow matching uses deterministic ODE sampling that is already strong and smooth; PGFM adds lightweight prototype guidance to further improve performance.
  • Figure 2: Sampling process of PGFM: Starting from Gaussian noise, we then sample with a pretrained flow-matching generator (GMFlow) while applying lightweight, early-stage prototype guidance with a trust region to improve mode coverage without washing out details.
  • Figure 3: t-SNE Analysis. Visual comparison of latent distributions. (a) Generated by FM baseline. (b) Generated by MGD$^3$. (c) Generated by our PGFM. ( Synthetic image, Real image, Prototypes). The PGFM method demonstrates better alignment with the real latent distribution.
  • Figure 4: Representativeness and coverage analysis
  • Figure 5: Blurring synthetic images on ImageNette for $\text{IPC}=10$
  • ...and 3 more figures