Table of Contents
Fetching ...

Adaptive Training Meets Progressive Scaling: Elevating Efficiency in Diffusion Models

Wenhao Li, Xiu Su, Yu Han, Shan You, Tao Huang, Chang Xu

TL;DR

This paper tackles the inefficiency of diffusion models relying on a single denoiser across all timesteps, where distributions and task difficulty vary substantially. It introduces TDC Training, a two-stage divide-and-conquer framework that groups timesteps by difficulty using the SNR-based measure $SNR = 10 \log_{10}\left(\frac{\overline{\alpha}_t}{1-\overline{\alpha}_t}\right)$ and allocates progressive FLOPs via $FLOPs_{g}(i)=\left(\frac{i}{\mathcal{N}}+\frac{\mathcal{N}-i}{\mathcal{N}}\times k\right)\mathcal{F}$, followed by deriving group-specific denoisers through Proxy-based Pruning with GPT-4 and a memory bank for iterative refinement. The approach yields substantial FID improvements (e.g., $0.32$ on CIFAR10, $1.5$ on ImageNet64, $0.27$ on FFHQ) while reducing compute by about 20% across IDDPM and LDM. A two-stage training strategy outperforms single-stage counterparts and proves robust to FLOPs budgeting ($k$), with pruning stability aided by the memory mechanism. Overall, the method provides a practical, scalable path to task-aware diffusion with meaningful efficiency gains.

Abstract

Diffusion models have demonstrated remarkable efficacy in various generative tasks with the predictive prowess of denoising model. Currently, diffusion models employ a uniform denoising model across all timesteps. However, the inherent variations in data distributions at different timesteps lead to conflicts during training, constraining the potential of diffusion models. To address this challenge, we propose a novel two-stage divide-and-conquer training strategy termed TDC Training. It groups timesteps based on task similarity and difficulty, assigning highly customized denoising models to each group, thereby enhancing the performance of diffusion models. While two-stage training avoids the need to train each model separately, the total training cost is even lower than training a single unified denoising model. Additionally, we introduce Proxy-based Pruning to further customize the denoising models. This method transforms the pruning problem of diffusion models into a multi-round decision-making problem, enabling precise pruning of diffusion models. Our experiments validate the effectiveness of TDC Training, demonstrating improvements in FID of 1.5 on ImageNet64 compared to original IDDPM, while saving about 20\% of computational resources.

Adaptive Training Meets Progressive Scaling: Elevating Efficiency in Diffusion Models

TL;DR

This paper tackles the inefficiency of diffusion models relying on a single denoiser across all timesteps, where distributions and task difficulty vary substantially. It introduces TDC Training, a two-stage divide-and-conquer framework that groups timesteps by difficulty using the SNR-based measure and allocates progressive FLOPs via , followed by deriving group-specific denoisers through Proxy-based Pruning with GPT-4 and a memory bank for iterative refinement. The approach yields substantial FID improvements (e.g., on CIFAR10, on ImageNet64, on FFHQ) while reducing compute by about 20% across IDDPM and LDM. A two-stage training strategy outperforms single-stage counterparts and proves robust to FLOPs budgeting (), with pruning stability aided by the memory mechanism. Overall, the method provides a practical, scalable path to task-aware diffusion with meaningful efficiency gains.

Abstract

Diffusion models have demonstrated remarkable efficacy in various generative tasks with the predictive prowess of denoising model. Currently, diffusion models employ a uniform denoising model across all timesteps. However, the inherent variations in data distributions at different timesteps lead to conflicts during training, constraining the potential of diffusion models. To address this challenge, we propose a novel two-stage divide-and-conquer training strategy termed TDC Training. It groups timesteps based on task similarity and difficulty, assigning highly customized denoising models to each group, thereby enhancing the performance of diffusion models. While two-stage training avoids the need to train each model separately, the total training cost is even lower than training a single unified denoising model. Additionally, we introduce Proxy-based Pruning to further customize the denoising models. This method transforms the pruning problem of diffusion models into a multi-round decision-making problem, enabling precise pruning of diffusion models. Our experiments validate the effectiveness of TDC Training, demonstrating improvements in FID of 1.5 on ImageNet64 compared to original IDDPM, while saving about 20\% of computational resources.
Paper Structure (12 sections, 12 equations, 8 figures, 6 tables, 1 algorithm)

This paper contains 12 sections, 12 equations, 8 figures, 6 tables, 1 algorithm.

Figures (8)

  • Figure 1: Visualization of Diffusion Model Performance: Circle sizes represent computational costs (GFLOPs) while vertical positioning indicates FID scores.
  • Figure 2: Pipeline of Our TDC Training Strategy. First, SNR for each timestep is calculated to estimate the difficulty of the denoising task. Timesteps are then grouped based on task difficulty, and model capacity is allocated accordingly. During training, a base model covering all timesteps is trained in the first phase. In the second phase, for each group, Proxy-based Pruning is applied to the base model according to the allocated model capacity, and then fine-tuning is performed on the timesteps within each group to obtain specialized models for each group.
  • Figure 3: Comparison of FID and Training Steps Across Different Training Strategies
  • Figure 4: Sample images of LDM on FFHQ with (top) and without (bottom) our TDC Training(100 sampling steps).
  • Figure 5: Mean-std Curve over Pruning Rounds.
  • ...and 3 more figures