Table of Contents
Fetching ...

Improving Diffusion Models's Data-Corruption Resistance using Scheduled Pseudo-Huber Loss

Artem Khrapov, Vadim Popov, Tasnima Sadekova, Assel Yermekova, Mikhail Kudinov

TL;DR

The paper tackles diffusion models' sensitivity to training data outliers by introducing a time-dependent pseudo-Huber loss, enabling robust learning without data filtering. By scheduling the delta parameter, the method balances robustness during early reverse-diffusion steps with fine-detail reconstruction in later steps. The authors provide theoretical motivation and demonstrate improvements on image and audio tasks, showing enhanced resilience to corrupted data and reduced reliance on dataset purification. This approach offers a practical, low-cost path to robust diffusion training with potential wide-ranging impact in real-world data-laden settings.

Abstract

Diffusion models are known to be vulnerable to outliers in training data. In this paper we study an alternative diffusion loss function, which can preserve the high quality of generated data like the original squared $L_{2}$ loss while at the same time being robust to outliers. We propose to use pseudo-Huber loss function with a time-dependent parameter to allow for the trade-off between robustness on the most vulnerable early reverse-diffusion steps and fine details restoration on the final steps. We show that pseudo-Huber loss with the time-dependent parameter exhibits better performance on corrupted datasets in both image and audio domains. In addition, the loss function we propose can potentially help diffusion models to resist dataset corruption while not requiring data filtering or purification compared to conventional training algorithms.

Improving Diffusion Models's Data-Corruption Resistance using Scheduled Pseudo-Huber Loss

TL;DR

The paper tackles diffusion models' sensitivity to training data outliers by introducing a time-dependent pseudo-Huber loss, enabling robust learning without data filtering. By scheduling the delta parameter, the method balances robustness during early reverse-diffusion steps with fine-detail reconstruction in later steps. The authors provide theoretical motivation and demonstrate improvements on image and audio tasks, showing enhanced resilience to corrupted data and reduced reliance on dataset purification. This approach offers a practical, low-cost path to robust diffusion training with potential wide-ranging impact in real-world data-laden settings.

Abstract

Diffusion models are known to be vulnerable to outliers in training data. In this paper we study an alternative diffusion loss function, which can preserve the high quality of generated data like the original squared loss while at the same time being robust to outliers. We propose to use pseudo-Huber loss function with a time-dependent parameter to allow for the trade-off between robustness on the most vulnerable early reverse-diffusion steps and fine details restoration on the final steps. We show that pseudo-Huber loss with the time-dependent parameter exhibits better performance on corrupted datasets in both image and audio domains. In addition, the loss function we propose can potentially help diffusion models to resist dataset corruption while not requiring data filtering or purification compared to conventional training algorithms.
Paper Structure (18 sections, 13 equations, 10 figures, 1 table)

This paper contains 18 sections, 13 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: Scheme of the process. Off-topic images are added to a clean dataset of cat photos. When the L2 loss function is used, it leads to concept distortion and even erasure (see the fractal-like structures in the second row). Meanwhile the Huber loss training results stay consistent with their not-corrupted counterparts.
  • Figure 2: The plot of the Resilience factor for 1 - LPIPS similarity for all the tested text2image prompts at different levels of corruption at the selected $\delta = 0.01$.
  • Figure 3: Speaker similarity for different iterations averaged across speakers. Models: clean - trained on clean dataset; $l_2$ and huber scheduled - trained on mixed dataset with corresponding losses.
  • Figure 4: Number of synthesized samples with similarity less than corresponding threshold. Total number of samples $1260$ from the best checkpoints on $350$ iterations.
  • Figure 5: A study of various Pseudo-Huber loss schedulers's LPIPS-calculated resilience, averaged across all the prompts used and computed for different amounts of corruption. (Pseudo-)Huber losses with the postfix old are the implementation of (\ref{['pseudo-huber-loss-diffusers']}), while ones without are the corrected versions (\ref{['eq:pseudo-huber-loss']}). The schedules used are the backwards and the forward (without the note) versions of the exponential schedule (\ref{['exp-schedule']}); it is evident that the backwards version fails significantly. It's evident that our final adopted schedule outperforms L2 and all the other tested variants.
  • ...and 5 more figures