Table of Contents
Fetching ...

Unmasking Bias in Diffusion Model Training

Hu Yu, Li Shen, Jie Huang, Hongsheng Li, Feng Zhao

TL;DR

This work identifies a fundamental bias in the standard $\epsilon$-prediction objective with constant weighting for diffusion models, showing that it induces biased estimation of $x_0$ and contributes to color shifts and unstable early sampling. It derives a principled loss weighting based on $1/\sqrt{\text{SNR}(t)}$ to emphasize earlier steps and counteract amplification of the bias, with the objective $L = \sum_t \mathbb{E}_{x_0,\epsilon} \left[ \frac{1}{\sqrt{\text{SNR}(t)}} \|\epsilon - \epsilon_\theta(x_t,t)\|^2 \right]$. Through theoretical analysis and extensive experiments on multiple datasets (e.g., FFHQ, CelebA-HQ, AFHQ-dog, MetFaces; CIFAR-10; ImageNet) and sampling regimes, the proposed weighting yields substantial improvements in sample quality (FID/IS) and training and sampling efficiency while using minimal changes to existing diffusion-model pipelines. The results offer a unified lens on prior weighting strategies and suggest that bias-aware loss design can meaningfully advance diffusion-model performance in practice.

Abstract

Denoising diffusion models have emerged as a dominant approach for image generation, however they still suffer from slow convergence in training and color shift issues in sampling. In this paper, we identify that these obstacles can be largely attributed to bias and suboptimality inherent in the default training paradigm of diffusion models. Specifically, we offer theoretical insights that the prevailing constant loss weight strategy in $ε$-prediction of diffusion models leads to biased estimation during the training phase, hindering accurate estimations of original images. To address the issue, we propose a simple but effective weighting strategy derived from the unlocked biased part. Furthermore, we conduct a comprehensive and systematic exploration, unraveling the inherent bias problem in terms of its existence, impact and underlying reasons. These analyses contribute to advancing the understanding of diffusion models. Empirical results demonstrate that our method remarkably elevates sample quality and displays improved efficiency in both training and sampling processes, by only adjusting loss weighting strategy. The code is released publicly at \url{https://github.com/yuhuUSTC/Debias}

Unmasking Bias in Diffusion Model Training

TL;DR

This work identifies a fundamental bias in the standard -prediction objective with constant weighting for diffusion models, showing that it induces biased estimation of and contributes to color shifts and unstable early sampling. It derives a principled loss weighting based on to emphasize earlier steps and counteract amplification of the bias, with the objective . Through theoretical analysis and extensive experiments on multiple datasets (e.g., FFHQ, CelebA-HQ, AFHQ-dog, MetFaces; CIFAR-10; ImageNet) and sampling regimes, the proposed weighting yields substantial improvements in sample quality (FID/IS) and training and sampling efficiency while using minimal changes to existing diffusion-model pipelines. The results offer a unified lens on prior weighting strategies and suggest that bias-aware loss design can meaningfully advance diffusion-model performance in practice.

Abstract

Denoising diffusion models have emerged as a dominant approach for image generation, however they still suffer from slow convergence in training and color shift issues in sampling. In this paper, we identify that these obstacles can be largely attributed to bias and suboptimality inherent in the default training paradigm of diffusion models. Specifically, we offer theoretical insights that the prevailing constant loss weight strategy in -prediction of diffusion models leads to biased estimation during the training phase, hindering accurate estimations of original images. To address the issue, we propose a simple but effective weighting strategy derived from the unlocked biased part. Furthermore, we conduct a comprehensive and systematic exploration, unraveling the inherent bias problem in terms of its existence, impact and underlying reasons. These analyses contribute to advancing the understanding of diffusion models. Empirical results demonstrate that our method remarkably elevates sample quality and displays improved efficiency in both training and sampling processes, by only adjusting loss weighting strategy. The code is released publicly at \url{https://github.com/yuhuUSTC/Debias}
Paper Structure (22 sections, 13 equations, 17 figures, 4 tables)

This paper contains 22 sections, 13 equations, 17 figures, 4 tables.

Figures (17)

  • Figure 1: Examples for the bias problem in $\epsilon$-prediction with constant weighting. Images are generated with different total sampling steps $T$. The upper two rows showcase samples obtained through constant weighting, exhibiting color shift and poor details. The bottom ones display samples generated using our method.
  • Figure 2: Left: The visualization of SNR($t$) and amplification coefficient $\frac{1}{\sqrt{\text{SNR($t$)}}}$ at different timesteps. Right: The upper row is the input $x_t$ at different timesteps. We employ the diffusion model dhariwal2021diffusion pretrained on ImageNet dataset to obtain the $\mathit{estimated \ \hat{x}_0}$ part and $\mathit{amplified \ error}$ part of each input $x_t$. The second row is the $\mathit{estimated \ \hat{x}_0}$. The bottom row is the corresponding $\mathit{amplified \ error}$ part. Apparently, as step $t$ gets larger, the $\mathit{estimated \ \hat{x}_0}$ severely deviates from $x_0$ and the $\mathit{amplified \ error}$ part gradually approaches $x_0$.
  • Figure 3: We present the one-step estimation results of $\hat{x}_0$ using different input samples $x_t$, where the diffusion models are pretrained on the FFHQ dataset karras2019style with different loss weighting strategies. One-step estimation: start from a clean image and add noise to get $x_t$ according to Eq. \ref{['eq:3']}. Then put $x_t$ into the denoising network once to get the estimated noise $\hat{\epsilon}$, and the corresponding $\hat{x}_0$. The top row displays the results obtained using a well-trained constant weighting model, while the bottom row depicts the results achieved with our well-trained improved weighting model.
  • Figure 4: MSE-step curve under several settings. "Initial" mode is calculated between input and target. Obviously, the optimization difficulty is vastly different across step $t$. "Constant" and "Ours" modes are calculated between network output and target, and "Constant" denotes constant weight strategy and "Ours" stands for our proposed weight strategy. Note that the green and red curve visually overlap in the left figure due to large scale.
  • Figure 5: Left: Visualization of various weighting strategies. P2 and Min-SNR start from the basis of constant weight and lower the weight down for small $t$. Right: Sampling results with different total sampling steps $T$. From top to bottom, they are constant, P2, Min-SNR, and our method. Evidently, P2 and Min-SNR still suffer from bias and artifacts during the initial generation stage.
  • ...and 12 more figures