Table of Contents
Fetching ...

Step Saver: Predicting Minimum Denoising Steps for Diffusion Model Image Generation

Jean Yu, Haim Barad

TL;DR

This work tackles slow diffusion-based image generation by predicting the minimum denoise steps needed per prompt to maintain visual quality. It introduces StepSaver, combining SSIM-based optimization and an NLP-based denoise-steps recommender to deliver real-time, prompt-specific step counts that are often far lower than conventional fixed settings. The approach yields significant runtime savings while improving or matching image quality as measured by $FID$, with results validated on a large LAION-Aesthetics-derived dataset and a substantial real-time evaluation. The proposed framework is compatible with multiple schedulers and demonstrates practical impact for resource-constrained, high-throughput image generation pipelines.

Abstract

In this paper, we introduce an innovative NLP model specifically fine-tuned to determine the minimal number of denoising steps required for any given text prompt. This advanced model serves as a real-time tool that recommends the ideal denoise steps for generating high-quality images efficiently. It is designed to work seamlessly with the Diffusion model, ensuring that images are produced with superior quality in the shortest possible time. Although our explanation focuses on the DDIM scheduler, the methodology is adaptable and can be applied to various other schedulers like Euler, Euler Ancestral, Heun, DPM2 Karras, UniPC, and more. This model allows our customers to conserve costly computing resources by executing the fewest necessary denoising steps to achieve optimal quality in the produced images.

Step Saver: Predicting Minimum Denoising Steps for Diffusion Model Image Generation

TL;DR

This work tackles slow diffusion-based image generation by predicting the minimum denoise steps needed per prompt to maintain visual quality. It introduces StepSaver, combining SSIM-based optimization and an NLP-based denoise-steps recommender to deliver real-time, prompt-specific step counts that are often far lower than conventional fixed settings. The approach yields significant runtime savings while improving or matching image quality as measured by , with results validated on a large LAION-Aesthetics-derived dataset and a substantial real-time evaluation. The proposed framework is compatible with multiple schedulers and demonstrates practical impact for resource-constrained, high-throughput image generation pipelines.

Abstract

In this paper, we introduce an innovative NLP model specifically fine-tuned to determine the minimal number of denoising steps required for any given text prompt. This advanced model serves as a real-time tool that recommends the ideal denoise steps for generating high-quality images efficiently. It is designed to work seamlessly with the Diffusion model, ensuring that images are produced with superior quality in the shortest possible time. Although our explanation focuses on the DDIM scheduler, the methodology is adaptable and can be applied to various other schedulers like Euler, Euler Ancestral, Heun, DPM2 Karras, UniPC, and more. This model allows our customers to conserve costly computing resources by executing the fewest necessary denoising steps to achieve optimal quality in the produced images.
Paper Structure (15 sections, 11 figures)

This paper contains 15 sections, 11 figures.

Figures (11)

  • Figure 1: The Diffusion Process
  • Figure 2: visual display of generated images from 6 prompts and why denoise steps matter
  • Figure 3: Deep Dive – Generated image for prompt “Joel Robison The Black Dog”
  • Figure 4: Deep Dive – Generated images for prompt “Slow Cooker Salsa Chicken Tacos | via Midwest Nice Blog”
  • Figure 5: Average Image Generation Time vs Denoise Steps (Accelerator: Habana Gaudi1)
  • ...and 6 more figures