Table of Contents
Fetching ...

Your Diffusion Model is Secretly a Noise Classifier and Benefits from Contrastive Training

Yunshu Wu, Yingtao Luo, Xianghao Kong, Evangelos E. Papalexakis, Greg Ver Steeg

TL;DR

This paper introduces a new self-supervised training objective that differentiates the levels of noise added to a sample, leading to improved OOD denoising performance and improves the performance and speed of parallel samplers significantly.

Abstract

Diffusion models learn to denoise data and the trained denoiser is then used to generate new samples from the data distribution. In this paper, we revisit the diffusion sampling process and identify a fundamental cause of sample quality degradation: the denoiser is poorly estimated in regions that are far Outside Of the training Distribution (OOD), and the sampling process inevitably evaluates in these OOD regions. This can become problematic for all sampling methods, especially when we move to parallel sampling which requires us to initialize and update the entire sample trajectory of dynamics in parallel, leading to many OOD evaluations. To address this problem, we introduce a new self-supervised training objective that differentiates the levels of noise added to a sample, leading to improved OOD denoising performance. The approach is based on our observation that diffusion models implicitly define a log-likelihood ratio that distinguishes distributions with different amounts of noise, and this expression depends on denoiser performance outside the standard training distribution. We show by diverse experiments that the proposed contrastive diffusion training is effective for both sequential and parallel settings, and it improves the performance and speed of parallel samplers significantly.

Your Diffusion Model is Secretly a Noise Classifier and Benefits from Contrastive Training

TL;DR

This paper introduces a new self-supervised training objective that differentiates the levels of noise added to a sample, leading to improved OOD denoising performance and improves the performance and speed of parallel samplers significantly.

Abstract

Diffusion models learn to denoise data and the trained denoiser is then used to generate new samples from the data distribution. In this paper, we revisit the diffusion sampling process and identify a fundamental cause of sample quality degradation: the denoiser is poorly estimated in regions that are far Outside Of the training Distribution (OOD), and the sampling process inevitably evaluates in these OOD regions. This can become problematic for all sampling methods, especially when we move to parallel sampling which requires us to initialize and update the entire sample trajectory of dynamics in parallel, leading to many OOD evaluations. To address this problem, we introduce a new self-supervised training objective that differentiates the levels of noise added to a sample, leading to improved OOD denoising performance. The approach is based on our observation that diffusion models implicitly define a log-likelihood ratio that distinguishes distributions with different amounts of noise, and this expression depends on denoiser performance outside the standard training distribution. We show by diverse experiments that the proposed contrastive diffusion training is effective for both sequential and parallel settings, and it improves the performance and speed of parallel samplers significantly.
Paper Structure (25 sections, 33 equations, 15 figures, 7 tables, 1 algorithm)

This paper contains 25 sections, 33 equations, 15 figures, 7 tables, 1 algorithm.

Figures (15)

  • Figure 1: We plot the error in the score estimate for an 1D two mode Gaussian example where diffusion dynamics bridge between a Gaussian and a mixture (see Appendix \ref{['sec:1d_gauss_exp']}). Regions near the standard forward training data paths have lower error magnitude (light), whereas other areas have higher error magnitude (dark). While sequential samplers adhere as closely as possible to low-error regions, parallel samplers initialize and update the entire sample trajectory (blue trajectories), leading to evaluations in high-error regions. When the sampling trajectory is initialized, most are inevitably in the OOD regions and will update to the low-error regions gradually.
  • Figure 2: The computation graph of Picard iteration for parallel sampling shih2024parallel
  • Figure 3: Parallel DDPM sampler generated Dino data. Comparing to Dino sampled from DDPM loss, CDL-loss sampled Dino has better sample quality and density estimate around hard areas.
  • Figure 4: Parallel sampling MMD plot. We can see that the CDL-regularized model and DDPM model converge themselves at 35 and 36 Picard iterations, separately.
  • Figure 5: The FID comparison between our CDL and the baseline EDM in the deterministic sampler experiment.
  • ...and 10 more figures