Step Saver: Predicting Minimum Denoising Steps for Diffusion Model Image Generation

Jean Yu; Haim Barad

Step Saver: Predicting Minimum Denoising Steps for Diffusion Model Image Generation

Jean Yu, Haim Barad

TL;DR

This work tackles slow diffusion-based image generation by predicting the minimum denoise steps needed per prompt to maintain visual quality. It introduces StepSaver, combining SSIM-based optimization and an NLP-based denoise-steps recommender to deliver real-time, prompt-specific step counts that are often far lower than conventional fixed settings. The approach yields significant runtime savings while improving or matching image quality as measured by $FID$, with results validated on a large LAION-Aesthetics-derived dataset and a substantial real-time evaluation. The proposed framework is compatible with multiple schedulers and demonstrates practical impact for resource-constrained, high-throughput image generation pipelines.

Abstract

In this paper, we introduce an innovative NLP model specifically fine-tuned to determine the minimal number of denoising steps required for any given text prompt. This advanced model serves as a real-time tool that recommends the ideal denoise steps for generating high-quality images efficiently. It is designed to work seamlessly with the Diffusion model, ensuring that images are produced with superior quality in the shortest possible time. Although our explanation focuses on the DDIM scheduler, the methodology is adaptable and can be applied to various other schedulers like Euler, Euler Ancestral, Heun, DPM2 Karras, UniPC, and more. This model allows our customers to conserve costly computing resources by executing the fewest necessary denoising steps to achieve optimal quality in the produced images.

Step Saver: Predicting Minimum Denoising Steps for Diffusion Model Image Generation

TL;DR

, with results validated on a large LAION-Aesthetics-derived dataset and a substantial real-time evaluation. The proposed framework is compatible with multiple schedulers and demonstrates practical impact for resource-constrained, high-throughput image generation pipelines.

Abstract

Paper Structure (15 sections, 11 figures)

This paper contains 15 sections, 11 figures.

Introduction
Diffusion Process
Existing Solutions
Details of StepSaver
Method to associate between denoise steps and image quality
Research Environment
Defining the Optimal Denoise Steps
Optimal Denoise Steps Setting - Impact on Quality of Generated Images
Optimal Denoise Steps Setting - Impact on Performance
Calculating the Optimal Denoise Steps Based on SSIM Metric
Metric for Measuring Image Quality
Denoise Steps Recommender Service (based on a new NLP model)
Dataset Preprocessing for NLP Model Training
Model Training
Using Flexi-Steps Recommended by the NLP Model

Figures (11)

Figure 1: The Diffusion Process
Figure 2: visual display of generated images from 6 prompts and why denoise steps matter
Figure 3: Deep Dive – Generated image for prompt “Joel Robison The Black Dog”
Figure 4: Deep Dive – Generated images for prompt “Slow Cooker Salsa Chicken Tacos | via Midwest Nice Blog”
Figure 5: Average Image Generation Time vs Denoise Steps (Accelerator: Habana Gaudi1)
...and 6 more figures

Step Saver: Predicting Minimum Denoising Steps for Diffusion Model Image Generation

TL;DR

Abstract

Step Saver: Predicting Minimum Denoising Steps for Diffusion Model Image Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (11)