Table of Contents
Fetching ...

Hyperparameters are all you need: Using five-step inference for an original diffusion model to generate images comparable to the latest distillation model

Zilai Li

TL;DR

The paper tackles the heavy computational burden of diffusion-model sampling by introducing a training-free inference plugin that utilizes truncation-error analysis and hyperparameter-tuned ODE solvers, augmented with a Free-U decorator to modify the U-Net's skip connections. By operating in a latent-diffusion framework, the method enables 5–6 step generation for high-resolution images without retraining, and reports competitive or superior FID against state-of-the-art distillation models on COCO/LAION datasets. An information-theoretic analysis and ablation studies elucidate why certain hyperparameter couplings and final-stage denoising preserve image diversity while accelerating inference. The authors provide extensive experiments across 512×512 and 1024×1024 outputs, demonstrating robust performance and offering a public codebase for reproducibility.

Abstract

The diffusion model is a state-of-the-art generative model that samples images by applying a neural network iteratively. However, the original sampling algorithm requires substantial computation cost, and reducing the sampling step is a prevailing research area. To cope with this problem, one mainstream approach is to treat the sampling process as an algorithm that solves an ordinary differential equation (ODE). Our study proposes a training-free inference plugin compatible with most few-step ODE solvers. To the best of my knowledge, our algorithm is the first training-free algorithm to sample a 1024 x 1024-resolution image in 6 steps and a 512 x 512-resolution image in 5 steps, with an FID result that outperforms the SOTA distillation models and the 20-step DPM++ 2m solver, respectively. Based on analyses of the latent diffusion model's structure, the diffusion ODE, and the Free-U mechanism, we explain why specific hyperparameter couplings improve stability and inference speed without retraining. Meanwhile, experimental results also reveal a new design space of the latent diffusion ODE solver. Additionally, we also analyze the difference between the original diffusion model and the diffusion distillation model via an information-theoretic study, which shows the reason why the few-step ODE solver designed for the diffusion model can outperform the training-based diffusion distillation algorithm in few-step inference. The tentative results of the experiment prove the mathematical analysis. code base is below: https://github.com/TheLovesOfLadyPurple/Hyperparameter-is-all-you-need

Hyperparameters are all you need: Using five-step inference for an original diffusion model to generate images comparable to the latest distillation model

TL;DR

The paper tackles the heavy computational burden of diffusion-model sampling by introducing a training-free inference plugin that utilizes truncation-error analysis and hyperparameter-tuned ODE solvers, augmented with a Free-U decorator to modify the U-Net's skip connections. By operating in a latent-diffusion framework, the method enables 5–6 step generation for high-resolution images without retraining, and reports competitive or superior FID against state-of-the-art distillation models on COCO/LAION datasets. An information-theoretic analysis and ablation studies elucidate why certain hyperparameter couplings and final-stage denoising preserve image diversity while accelerating inference. The authors provide extensive experiments across 512×512 and 1024×1024 outputs, demonstrating robust performance and offering a public codebase for reproducibility.

Abstract

The diffusion model is a state-of-the-art generative model that samples images by applying a neural network iteratively. However, the original sampling algorithm requires substantial computation cost, and reducing the sampling step is a prevailing research area. To cope with this problem, one mainstream approach is to treat the sampling process as an algorithm that solves an ordinary differential equation (ODE). Our study proposes a training-free inference plugin compatible with most few-step ODE solvers. To the best of my knowledge, our algorithm is the first training-free algorithm to sample a 1024 x 1024-resolution image in 6 steps and a 512 x 512-resolution image in 5 steps, with an FID result that outperforms the SOTA distillation models and the 20-step DPM++ 2m solver, respectively. Based on analyses of the latent diffusion model's structure, the diffusion ODE, and the Free-U mechanism, we explain why specific hyperparameter couplings improve stability and inference speed without retraining. Meanwhile, experimental results also reveal a new design space of the latent diffusion ODE solver. Additionally, we also analyze the difference between the original diffusion model and the diffusion distillation model via an information-theoretic study, which shows the reason why the few-step ODE solver designed for the diffusion model can outperform the training-based diffusion distillation algorithm in few-step inference. The tentative results of the experiment prove the mathematical analysis. code base is below: https://github.com/TheLovesOfLadyPurple/Hyperparameter-is-all-you-need

Paper Structure

This paper contains 14 sections, 23 equations, 8 figures, 7 tables, 1 algorithm.

Figures (8)

  • Figure 1: The comparison between our algorithm and UniPC solver in few step inference
  • Figure 2: The comparison between our custom DPM++1s sampler and the original DPM++1s sampler. The first line is the result trajectory of the original DPM++1s sampler with Karras' schedule, the second one is Karras' schedule + Free-U, and the last one is our sampler. The results show that if we apply Free-U in a few-step sampling without involving additional tricks, the generation result will be degraded.
  • Figure 3: The comparison between different discrete methods when using the same DPM++ 1s Sampler in a few-step inference. The first line exhibits the trajectory of the normal Karras discrete method. The second one exhibits the trajectory of the Karras discrete method + Free-U. The third exhibits the normal discrete method with a normal U-Net, and the final one exhibits the normal discrete method with Free-U. And the final line is similar to the first line.
  • Figure 4: The result that starts using free-U in the different steps of the UniPC solver.
  • Figure 5: The comparison between the different resolution images and different type images. The first line contains 512 x 512 images generated by Stable Diffusion 1.5, the second line contains 1024 x 1024 images generated by Dreamshaper XL. The last lines contain high-resolution ACGN images generated by Illustrij Evo.
  • ...and 3 more figures