Is Noise Conditioning Necessary for Denoising Generative Models?
Qiao Sun, Zhicheng Jiang, Hanhong Zhao, Kaiming He
TL;DR
The paper challenges the widely held belief that noise conditioning is essential for denoising diffusion and related generative models. By reformulating training and sampling and analyzing effective targets, posterior concentration, and sampling error, the authors show that many models tolerate, or even benefit from, removing noise conditioning, with a theoretical bound predicting robustness. They introduce a noise-unconditional EDM variant (uEDM) that achieves competitive CIFAR-10 performance (FID ~2.23), narrowing the gap to conditioned baselines. Across extensive experiments on CIFAR-10, ImageNet, and FFHQ, the work reveals that noise conditioning is not a prerequisite for functionality and can inspire new unconditional modeling approaches and sampling strategies. The findings offer practical guidance for model design and suggest avenues for integrating physics-based Langevin dynamics and alternative training objectives without time conditioning.
Abstract
It is widely believed that noise conditioning is indispensable for denoising diffusion models to work successfully. This work challenges this belief. Motivated by research on blind image denoising, we investigate a variety of denoising-based generative models in the absence of noise conditioning. To our surprise, most models exhibit graceful degradation, and in some cases, they even perform better without noise conditioning. We provide a theoretical analysis of the error caused by removing noise conditioning and demonstrate that our analysis aligns with empirical observations. We further introduce a noise-unconditional model that achieves a competitive FID of 2.23 on CIFAR-10, significantly narrowing the gap to leading noise-conditional models. We hope our findings will inspire the community to revisit the foundations and formulations of denoising generative models.
