Table of Contents
Fetching ...

Denoising Task Difficulty-based Curriculum for Training Diffusion Models

Jin-Young Kim, Hyojun Go, Soonwoo Kwon, Hyun-Gyoon Kim

TL;DR

This work resolves a long-standing ambiguity about which denoising timesteps are hardest in diffusion models by jointly analyzing convergence speed and the evolution of relative entropy between consecutive distributions $D_{KL}(p_{t-1}||p_t)$. It introduces an easy-to-hard curriculum that clusters timesteps by noise level using SNR-based intervals and trains models progressively from easy to hard before resuming standard all-timestep learning. Empirical results across unconditional, class-conditional, and text-to-image generation on FFHQ, ImageNet, CIFAR-10, and MS-COCO show improved generation quality and faster convergence, with strong compatibility with existing diffusion-techniques like loss weighting and architecture enhancements. The approach offers a practical, model-agnostic strategy to boost diffusion training efficiency and performance, highlighting the value of curriculum design in temporal denoising tasks for broad generative applications.

Abstract

Diffusion-based generative models have emerged as powerful tools in the realm of generative modeling. Despite extensive research on denoising across various timesteps and noise levels, a conflict persists regarding the relative difficulties of the denoising tasks. While various studies argue that lower timesteps present more challenging tasks, others contend that higher timesteps are more difficult. To address this conflict, our study undertakes a comprehensive examination of task difficulties, focusing on convergence behavior and changes in relative entropy between consecutive probability distributions across timesteps. Our observational study reveals that denoising at earlier timesteps poses challenges characterized by slower convergence and higher relative entropy, indicating increased task difficulty at these lower timesteps. Building on these observations, we introduce an easy-to-hard learning scheme, drawing from curriculum learning, to enhance the training process of diffusion models. By organizing timesteps or noise levels into clusters and training models with ascending orders of difficulty, we facilitate an order-aware training regime, progressing from easier to harder denoising tasks, thereby deviating from the conventional approach of training diffusion models simultaneously across all timesteps. Our approach leads to improved performance and faster convergence by leveraging benefits of curriculum learning, while maintaining orthogonality with existing improvements in diffusion training techniques. We validate these advantages through comprehensive experiments in image generation tasks, including unconditional, class-conditional, and text-to-image generation.

Denoising Task Difficulty-based Curriculum for Training Diffusion Models

TL;DR

This work resolves a long-standing ambiguity about which denoising timesteps are hardest in diffusion models by jointly analyzing convergence speed and the evolution of relative entropy between consecutive distributions . It introduces an easy-to-hard curriculum that clusters timesteps by noise level using SNR-based intervals and trains models progressively from easy to hard before resuming standard all-timestep learning. Empirical results across unconditional, class-conditional, and text-to-image generation on FFHQ, ImageNet, CIFAR-10, and MS-COCO show improved generation quality and faster convergence, with strong compatibility with existing diffusion-techniques like loss weighting and architecture enhancements. The approach offers a practical, model-agnostic strategy to boost diffusion training efficiency and performance, highlighting the value of curriculum design in temporal denoising tasks for broad generative applications.

Abstract

Diffusion-based generative models have emerged as powerful tools in the realm of generative modeling. Despite extensive research on denoising across various timesteps and noise levels, a conflict persists regarding the relative difficulties of the denoising tasks. While various studies argue that lower timesteps present more challenging tasks, others contend that higher timesteps are more difficult. To address this conflict, our study undertakes a comprehensive examination of task difficulties, focusing on convergence behavior and changes in relative entropy between consecutive probability distributions across timesteps. Our observational study reveals that denoising at earlier timesteps poses challenges characterized by slower convergence and higher relative entropy, indicating increased task difficulty at these lower timesteps. Building on these observations, we introduce an easy-to-hard learning scheme, drawing from curriculum learning, to enhance the training process of diffusion models. By organizing timesteps or noise levels into clusters and training models with ascending orders of difficulty, we facilitate an order-aware training regime, progressing from easier to harder denoising tasks, thereby deviating from the conventional approach of training diffusion models simultaneously across all timesteps. Our approach leads to improved performance and faster convergence by leveraging benefits of curriculum learning, while maintaining orthogonality with existing improvements in diffusion training techniques. We validate these advantages through comprehensive experiments in image generation tasks, including unconditional, class-conditional, and text-to-image generation.
Paper Structure (50 sections, 5 equations, 18 figures, 6 tables, 2 algorithms)

This paper contains 50 sections, 5 equations, 18 figures, 6 tables, 2 algorithms.

Figures (18)

  • Figure 1: Loss and FID convergence plotted during training for each diffusion model $\mathrm{M}_i$ in DiT, EDM, and SiT. Since the loss scale for each model is different, we show the normalized value. We observe that as $i$ increases (i.e., corresponding to larger denoising timesteps), the loss converges more rapidly, and this convergence speed correlates with that of the FID scores.
  • Figure 2: The KLD of $p_{t-1}$ from $p_t$ against denoising timestep. As the timestep increases, the dynamics decrease.
  • Figure 3: The overview of our curriculum learning approach for diffusion models. (Left) We divide the timesteps into $N$ clusters, ${C_1, ..., C_N}$, with the difficulty of denoising tasks increasing from $C_N$ (easiest) to $C_1$ (hardest). (Right) As the curriculum progresses, learning accumulates harder task clusters, gradually increasing task difficulties.
  • Figure 4: Ablation study on $N$ and $\tau$. We use DiT-B on ImageNet 256$\times$256.
  • Figure 5: We visualized the curriculum transition and the corresponding loss across iterations ($N=20$). To make the loss graph more easily readable, the y-axis was truncated to 1.0.
  • ...and 13 more figures