Table of Contents
Fetching ...

Improved Noise Schedule for Diffusion Training

Tiankai Hang, Shuyang Gu, Xin Geng, Baining Guo

TL;DR

A novel approach to design the noise schedule for enhancing the training of diffusion models by exploiting the importance sampling of the logarithm of the Signal-to-Noise ratio, which allows the model to focus on the critical transition point between signal dominance and noise dominance.

Abstract

Diffusion models have emerged as the de facto choice for generating high-quality visual signals across various domains. However, training a single model to predict noise across various levels poses significant challenges, necessitating numerous iterations and incurring significant computational costs. Various approaches, such as loss weighting strategy design and architectural refinements, have been introduced to expedite convergence and improve model performance. In this study, we propose a novel approach to design the noise schedule for enhancing the training of diffusion models. Our key insight is that the importance sampling of the logarithm of the Signal-to-Noise ratio ($\log \text{SNR}$), theoretically equivalent to a modified noise schedule, is particularly beneficial for training efficiency when increasing the sample frequency around $\log \text{SNR}=0$. This strategic sampling allows the model to focus on the critical transition point between signal dominance and noise dominance, potentially leading to more robust and accurate predictions.We empirically demonstrate the superiority of our noise schedule over the standard cosine schedule.Furthermore, we highlight the advantages of our noise schedule design on the ImageNet benchmark, showing that the designed schedule consistently benefits different prediction targets. Our findings contribute to the ongoing efforts to optimize diffusion models, potentially paving the way for more efficient and effective training paradigms in the field of generative AI.

Improved Noise Schedule for Diffusion Training

TL;DR

A novel approach to design the noise schedule for enhancing the training of diffusion models by exploiting the importance sampling of the logarithm of the Signal-to-Noise ratio, which allows the model to focus on the critical transition point between signal dominance and noise dominance.

Abstract

Diffusion models have emerged as the de facto choice for generating high-quality visual signals across various domains. However, training a single model to predict noise across various levels poses significant challenges, necessitating numerous iterations and incurring significant computational costs. Various approaches, such as loss weighting strategy design and architectural refinements, have been introduced to expedite convergence and improve model performance. In this study, we propose a novel approach to design the noise schedule for enhancing the training of diffusion models. Our key insight is that the importance sampling of the logarithm of the Signal-to-Noise ratio (), theoretically equivalent to a modified noise schedule, is particularly beneficial for training efficiency when increasing the sample frequency around . This strategic sampling allows the model to focus on the critical transition point between signal dominance and noise dominance, potentially leading to more robust and accurate predictions.We empirically demonstrate the superiority of our noise schedule over the standard cosine schedule.Furthermore, we highlight the advantages of our noise schedule design on the ImageNet benchmark, showing that the designed schedule consistently benefits different prediction targets. Our findings contribute to the ongoing efforts to optimize diffusion models, potentially paving the way for more efficient and effective training paradigms in the field of generative AI.
Paper Structure (23 sections, 29 equations, 8 figures, 11 tables)

This paper contains 23 sections, 29 equations, 8 figures, 11 tables.

Figures (8)

  • Figure 1: Illustration of the probability density functions of different noise schedules.
  • Figure 2: Comparison between adjusting the noise schedule, adjusting the loss weights and baseline setting. The Laplace noise schedule yields the best results and the fastest convergence speed.
  • Figure 3: FID-10K results on ImageNet-256 with location parameter $\mu$ fixed to 0 and different Laplace distribution scales $b$ in $\{0.25, 0.5, 1.0, 2.0, 3.0\}$. Baseline denotes standard cosine schedule.
  • Figure 4: Visualization of $p(\lambda)$ for Laplace schedule and cosine schedule with polynomial timestep sampling.
  • Figure 5: Comparison of probability density functions for different flow matching approaches. The plot shows three distributions: Flow Matching with Logit-Normal sampling (blue), Flow Matching without Logit-Normal sampling (green), and the Cosine schedule (orange).
  • ...and 3 more figures