Table of Contents
Fetching ...

SlimFlow: Training Smaller One-Step Diffusion Models with Rectified Flow

Yuanzhi Zhu, Xingchao Liu, Qiang Liu

TL;DR

This work tackles the dual bottlenecks of slow diffusion sampling and large model size by introducing SlimFlow, a framework that produces compact one‑step diffusion models within the rectified flow paradigm. It combines Annealing Reflow, which warm‑starts small models by progressively shifting from random to teacher data pairs, with Flow‑Guided Distillation, a regularized distillation that leverages few‑step guidance from a 2‑rectified flow to improve one‑step generation. Empirically, SlimFlow achieves state‑of‑the‑art or competitive results for small models on CIFAR10 (FID around 5.0 with 15.7M parameters) and shows strong performance on FFHQ‑64×64 and ImageNet‑64×64 with modestly larger but still compact models, highlighting its practical efficiency. Overall, SlimFlow enables faster, resource‑efficient diffusion generation suitable for on‑device and compute‑constrained applications, while preserving high generation quality.

Abstract

Diffusion models excel in high-quality generation but suffer from slow inference due to iterative sampling. While recent methods have successfully transformed diffusion models into one-step generators, they neglect model size reduction, limiting their applicability in compute-constrained scenarios. This paper aims to develop small, efficient one-step diffusion models based on the powerful rectified flow framework, by exploring joint compression of inference steps and model size. The rectified flow framework trains one-step generative models using two operations, reflow and distillation. Compared with the original framework, squeezing the model size brings two new challenges: (1) the initialization mismatch between large teachers and small students during reflow; (2) the underperformance of naive distillation on small student models. To overcome these issues, we propose Annealing Reflow and Flow-Guided Distillation, which together comprise our SlimFlow framework. With our novel framework, we train a one-step diffusion model with an FID of 5.02 and 15.7M parameters, outperforming the previous state-of-the-art one-step diffusion model (FID=6.47, 19.4M parameters) on CIFAR10. On ImageNet 64$\times$64 and FFHQ 64$\times$64, our method yields small one-step diffusion models that are comparable to larger models, showcasing the effectiveness of our method in creating compact, efficient one-step diffusion models.

SlimFlow: Training Smaller One-Step Diffusion Models with Rectified Flow

TL;DR

This work tackles the dual bottlenecks of slow diffusion sampling and large model size by introducing SlimFlow, a framework that produces compact one‑step diffusion models within the rectified flow paradigm. It combines Annealing Reflow, which warm‑starts small models by progressively shifting from random to teacher data pairs, with Flow‑Guided Distillation, a regularized distillation that leverages few‑step guidance from a 2‑rectified flow to improve one‑step generation. Empirically, SlimFlow achieves state‑of‑the‑art or competitive results for small models on CIFAR10 (FID around 5.0 with 15.7M parameters) and shows strong performance on FFHQ‑64×64 and ImageNet‑64×64 with modestly larger but still compact models, highlighting its practical efficiency. Overall, SlimFlow enables faster, resource‑efficient diffusion generation suitable for on‑device and compute‑constrained applications, while preserving high generation quality.

Abstract

Diffusion models excel in high-quality generation but suffer from slow inference due to iterative sampling. While recent methods have successfully transformed diffusion models into one-step generators, they neglect model size reduction, limiting their applicability in compute-constrained scenarios. This paper aims to develop small, efficient one-step diffusion models based on the powerful rectified flow framework, by exploring joint compression of inference steps and model size. The rectified flow framework trains one-step generative models using two operations, reflow and distillation. Compared with the original framework, squeezing the model size brings two new challenges: (1) the initialization mismatch between large teachers and small students during reflow; (2) the underperformance of naive distillation on small student models. To overcome these issues, we propose Annealing Reflow and Flow-Guided Distillation, which together comprise our SlimFlow framework. With our novel framework, we train a one-step diffusion model with an FID of 5.02 and 15.7M parameters, outperforming the previous state-of-the-art one-step diffusion model (FID=6.47, 19.4M parameters) on CIFAR10. On ImageNet 6464 and FFHQ 6464, our method yields small one-step diffusion models that are comparable to larger models, showcasing the effectiveness of our method in creating compact, efficient one-step diffusion models.
Paper Structure (17 sections, 14 equations, 11 figures, 8 tables, 1 algorithm)

This paper contains 17 sections, 14 equations, 11 figures, 8 tables, 1 algorithm.

Figures (11)

  • Figure 1: (a) Comparison of different one-step diffusion models on the CIFAR10 dataset. (b) To get powerful one-step diffusion model, our SlimFlow framework designs two stages: Annealing Reflow provides a warm-start for the small 2-Rectified Flow model by gradually shifting from training with random pairs to teacher pairs; Flow Guided Distillation enhances the one-step small model by distillation from 2-Rectified Flow with both off-line generated data using precise ODE solver and online generated data using 2-step Euler solver.
  • Figure 2: (a) Generation from 1-Rectified Flow trained without data augmentation. (b) Generation from the 1-Rectified Flow model in (a) after applying horizontal flip to the same set of random noises in (a). (c) Horizontally flipping the noise results in horizontally flipped generated image, but vertical flip does not result in vertically flipped generated image.
  • Figure 3: Random generation from our best one-step small models on three different datasets. Left: CIFAR10 32$\times$32 (#Params=27.9M). Mid: FFHQ 64$\times$64 (#Params=27.9M). Right: ImageNet 64$\times$64 (#Params=80.7M).
  • Figure 4: (a) Comparison of models trained with different methods on CIFAR10. (b) Comparison between 2-rectified flow and the distilled one-step generator on CIFAR10.
  • Figure 5: CIFAR10 samples from 2-rectified flow models trained with Annealing Reflow. All images are generated with the same set of random noises.
  • ...and 6 more figures