Table of Contents
Fetching ...

Analyzing and Improving Fast Sampling of Text-to-Image Diffusion Models

Zhenyu Zhou, Defang Chen, Siwei Lyu, Chun Chen, Can Wang

TL;DR

This paper proposes constant total rotation schedule (TORS), a scheduling strategy that ensures uniform geometric variation along the sampling trajectory that outperforms previous training-free acceleration methods and produces high-quality images with 10 sampling steps on Flux.

Abstract

Text-to-image diffusion models have achieved unprecedented success but still struggle to produce high-quality results under limited sampling budgets. Existing training-free sampling acceleration methods are typically developed independently, leaving the overall performance and compatibility among these methods unexplored. In this paper, we bridge this gap by systematically elucidating the design space, and our comprehensive experiments identify the sampling time schedule as the most pivotal factor. Inspired by the geometric properties of diffusion models revealed through the Frenet-Serret formulas, we propose constant total rotation schedule (TORS), a scheduling strategy that ensures uniform geometric variation along the sampling trajectory. TORS outperforms previous training-free acceleration methods and produces high-quality images with 10 sampling steps on Flux.1-Dev and Stable Diffusion 3.5. Extensive experiments underscore the adaptability of our method to unseen models, hyperparameters, and downstream applications.

Analyzing and Improving Fast Sampling of Text-to-Image Diffusion Models

TL;DR

This paper proposes constant total rotation schedule (TORS), a scheduling strategy that ensures uniform geometric variation along the sampling trajectory that outperforms previous training-free acceleration methods and produces high-quality images with 10 sampling steps on Flux.

Abstract

Text-to-image diffusion models have achieved unprecedented success but still struggle to produce high-quality results under limited sampling budgets. Existing training-free sampling acceleration methods are typically developed independently, leaving the overall performance and compatibility among these methods unexplored. In this paper, we bridge this gap by systematically elucidating the design space, and our comprehensive experiments identify the sampling time schedule as the most pivotal factor. Inspired by the geometric properties of diffusion models revealed through the Frenet-Serret formulas, we propose constant total rotation schedule (TORS), a scheduling strategy that ensures uniform geometric variation along the sampling trajectory. TORS outperforms previous training-free acceleration methods and produces high-quality images with 10 sampling steps on Flux.1-Dev and Stable Diffusion 3.5. Extensive experiments underscore the adaptability of our method to unseen models, hyperparameters, and downstream applications.
Paper Structure (23 sections, 8 equations, 19 figures, 6 tables)

This paper contains 23 sections, 8 equations, 19 figures, 6 tables.

Figures (19)

  • Figure 1: Text-to-image generation using Flux.1-Dev flux. Our proposed TORS attains image quality comparable to the 50-step baseline using only 10 steps.
  • Figure 2: (a) Diffusion sampling follows a pre-defined outer schedule $\{t_n\}_{n=0}^N$, while an inner schedule determines whether each step performs computation or feature reuse. A sample $\mathbf{x}_{t_{n+1}}$ transitions from $t_{n+1}$ to $t_n$ with the solver's update rule, utilizing stored historical velocities. Various types of cache objects can be selected, and a feature predictor is employed to estimate the current feature $\Delta_{t_n}$. This perspective also applies to U-Net architectures. (b) Four types of cache objects: velocity cache (V. cache), transformer cache (T. cache), block cache (B. cache), and operation cache (O. cache)
  • Figure 3: Comprehensive experiments examining the impact of each acceleration method on sampling acceleration. The experiments are conducted on Flux.1-Dev model flux. The outer schedule is identified as the most influential factor (see the vertical scale range).
  • Figure 4: Text-to-image generation with Flux.1-Dev flux using the default uniform outer schedule. As the number of sampling steps increases, the image structure continues to change and only stabilizes after around 30 steps.
  • Figure 5: We extend the insights of sampling regularity from diffusion models to flow-based models and propose TORS by utilizing the geometric properties of sampling trajectories. (a) Visualization of 100 sampling trajectories generated by Flux.1-Dev reveals a strong trajectory regularity. All sampling trajectories starting from Gaussian noises are uniformly shifted to the origin. (b) A 10-step example of our proposed TORS, which ensures a constant total rotation change $\int_0^S|\omega(s)|\mathrm{d} s$ along the sampling trajectory. (c) TORS yields considerably faster structural convergence compared with the uniform outer schedule (\ref{['fig:flawed']}).
  • ...and 14 more figures