Table of Contents
Fetching ...

Curriculum Sampling: A Two-Phase Curriculum for Efficient Training of Flow Matching

Pengwei Sun

Abstract

Timestep sampling $p(t)$ is a central design choice in Flow Matching models, yet common practice increasingly favors static middle-biased distributions (e.g., Logit-Normal). We show that this choice induces a speed--quality trade-off: middle-biased sampling accelerates early convergence but yields worse asymptotic fidelity than Uniform sampling. By analyzing per-timestep training losses, we identify a U-shaped difficulty profile with persistent errors near the boundary regimes, implying that under-sampling the endpoints leaves fine details unresolved. Guided by this insight, we propose \textbf{Curriculum Sampling}, a two-phase schedule that begins with middle-biased sampling for rapid structure learning and then switches to Uniform sampling for boundary refinement. On CIFAR-10, Curriculum Sampling improves the best FID from $3.85$ (Uniform) to $3.22$ while reaching peak performance at $100$k rather than $150$k training steps. Our results highlight that timestep sampling should be treated as an evolving curriculum rather than a fixed hyperparameter.

Curriculum Sampling: A Two-Phase Curriculum for Efficient Training of Flow Matching

Abstract

Timestep sampling is a central design choice in Flow Matching models, yet common practice increasingly favors static middle-biased distributions (e.g., Logit-Normal). We show that this choice induces a speed--quality trade-off: middle-biased sampling accelerates early convergence but yields worse asymptotic fidelity than Uniform sampling. By analyzing per-timestep training losses, we identify a U-shaped difficulty profile with persistent errors near the boundary regimes, implying that under-sampling the endpoints leaves fine details unresolved. Guided by this insight, we propose \textbf{Curriculum Sampling}, a two-phase schedule that begins with middle-biased sampling for rapid structure learning and then switches to Uniform sampling for boundary refinement. On CIFAR-10, Curriculum Sampling improves the best FID from (Uniform) to while reaching peak performance at k rather than k training steps. Our results highlight that timestep sampling should be treated as an evolving curriculum rather than a fixed hyperparameter.
Paper Structure (17 sections, 7 equations, 3 figures, 1 table)

This paper contains 17 sections, 7 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Comparison of best achieved FID scores across non-curriculum sampling strategies. Methods are sorted from best (lowest FID) to worst.
  • Figure 2: Comparison of Fréchet Inception Distance (FID) scores over the first 100k training steps for Uniform sampling versus Logit-Normal (LN, $\mu \in \{-0.8, -0.4\}, \sigma=1$) and Mode ($s=1.0$) sampling.
  • Figure 3: Time-Dependent Training Loss Dynamics and Sampling Densities. Top row: Evolution of the loss over the diffusion time $t \in [0, 1]$ throughout training (color gradient from 10k to 100k steps). Bottom row: Histograms of sampled time steps $t \sim p(t)$ from each distribution. Columns represent: (a) Uniform sampling, (b) Mode sampling ($s=-0.5$), which skews towards $t=0$ and $t=1$, (c) Logit-Normal sampling ($\mu=0.8$), shifting focus towards the noise-dominant regime ($t$ close to 1), and (d) Logit-Normal sampling ($\mu=-0.4$), emphasizing the mid-trajectory.