Table of Contents
Fetching ...

AdaDiff: Accelerating Diffusion Models through Step-Wise Adaptive Computation

Shengkun Tang, Yaqing Wang, Caiwen Ding, Yi Liang, Yao Li, Dongkuan Xu

TL;DR

AdaDiff tackles the slow generation of diffusion models by introducing step-wise adaptive computation guided by a timestep-aware Uncertainty Estimation Module (UEM) and an uncertainty-aware layer-wise loss (UAL). The framework enables dynamic exits during multi-step denoising, balancing speed and quality, and is trained with a joint objective that couples the standard denoising loss with uncertainty-guided regularizers. Across CIFAR-10, CelebA, ImageNet, and MS-COCO, AdaDiff achieves substantial inference time reductions (roughly 40–48% fewer layers) with minimal FID degradation, outperforming static exits and other acceleration baselines. The work also reveals that the uncertainty-weighted loss can improve full-model performance and provides uncertainty maps to illustrate when and where computation is saved, highlighting practical implications for real-time diffusion-based generation.

Abstract

Diffusion models achieve great success in generating diverse and high-fidelity images, yet their widespread application, especially in real-time scenarios, is hampered by their inherently slow generation speed. The slow generation stems from the necessity of multi-step network inference. While some certain predictions benefit from the full computation of the model in each sampling iteration, not every iteration requires the same amount of computation, potentially leading to inefficient computation. Unlike typical adaptive computation challenges that deal with single-step generation problems, diffusion processes with a multi-step generation need to dynamically adjust their computational resource allocation based on the ongoing assessment of each step's importance to the final image output, presenting a unique set of challenges. In this work, we propose AdaDiff, an adaptive framework that dynamically allocates computation resources in each sampling step to improve the generation efficiency of diffusion models. To assess the effects of changes in computational effort on image quality, we present a timestep-aware uncertainty estimation module (UEM). Integrated at each intermediate layer, the UEM evaluates the predictive uncertainty. This uncertainty measurement serves as an indicator for determining whether to terminate the inference process. Additionally, we introduce an uncertainty-aware layer-wise loss aimed at bridging the performance gap between full models and their adaptive counterparts.

AdaDiff: Accelerating Diffusion Models through Step-Wise Adaptive Computation

TL;DR

AdaDiff tackles the slow generation of diffusion models by introducing step-wise adaptive computation guided by a timestep-aware Uncertainty Estimation Module (UEM) and an uncertainty-aware layer-wise loss (UAL). The framework enables dynamic exits during multi-step denoising, balancing speed and quality, and is trained with a joint objective that couples the standard denoising loss with uncertainty-guided regularizers. Across CIFAR-10, CelebA, ImageNet, and MS-COCO, AdaDiff achieves substantial inference time reductions (roughly 40–48% fewer layers) with minimal FID degradation, outperforming static exits and other acceleration baselines. The work also reveals that the uncertainty-weighted loss can improve full-model performance and provides uncertainty maps to illustrate when and where computation is saved, highlighting practical implications for real-time diffusion-based generation.

Abstract

Diffusion models achieve great success in generating diverse and high-fidelity images, yet their widespread application, especially in real-time scenarios, is hampered by their inherently slow generation speed. The slow generation stems from the necessity of multi-step network inference. While some certain predictions benefit from the full computation of the model in each sampling iteration, not every iteration requires the same amount of computation, potentially leading to inefficient computation. Unlike typical adaptive computation challenges that deal with single-step generation problems, diffusion processes with a multi-step generation need to dynamically adjust their computational resource allocation based on the ongoing assessment of each step's importance to the final image output, presenting a unique set of challenges. In this work, we propose AdaDiff, an adaptive framework that dynamically allocates computation resources in each sampling step to improve the generation efficiency of diffusion models. To assess the effects of changes in computational effort on image quality, we present a timestep-aware uncertainty estimation module (UEM). Integrated at each intermediate layer, the UEM evaluates the predictive uncertainty. This uncertainty measurement serves as an indicator for determining whether to terminate the inference process. Additionally, we introduce an uncertainty-aware layer-wise loss aimed at bridging the performance gap between full models and their adaptive counterparts.
Paper Structure (14 sections, 11 equations, 6 figures, 6 tables)

This paper contains 14 sections, 11 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Average Mean Squared Error (MSE) loss between intermediate layer outputs and the final layer output across testing samples in the CIFAR-10, CelebA, and ImageNet datasets. The relatively low MSE values before the final layer suggest that earlier layers have the potential to generate results comparable to those of the final layer. This observation motivates the exploration of adaptive computation approaches.
  • Figure 2: An overview of our proposed AdaDiff method. At each step, the output of each intermediate layer is fed into the Uncertainty Estimation Module (UEM) to quantify the uncertainty. If the estimated uncertainty falls below a predefined threshold, the subsequent layers are skipped, allowing the network to adaptively adjust computation.
  • Figure 3: Average MSE loss across 1,000 denoising steps on the CIFAR-10 and CelebA datasets, where 1000 step is the first generation step and 0 is the final generation step. The varying loss values at different denoising steps indicate that the difficulty of denoising varies throughout the generation process.
  • Figure 4: Generation samples comparison between the model w/o adaptive computation (left) and model w/ adaptive computation (right) on COCO.
  • Figure 5: Performance-Efficiency trade-off curve on the CIFAR-10, CelebA, and COCO datasets. The trade-off curve demonstrates that our method maintains much lower FID scores, indicating higher quality generated images, while having the same layer reduction ratio as other adaptive computational methods.
  • ...and 1 more figures