Is Your Conditional Diffusion Model Actually Denoising?
Daniel Pfrommer, Zehao Dou, Christopher Scarvelis, Max Simchowitz, Ali Jadbabaie
TL;DR
The paper reveals that conditional diffusion models inherently exhibit non-denoising behavior, quantified by Schedule Deviation (SD), a measure of deviation from the model-consistent diffusion path. SD is calculable without access to the true score or training data and is strongly predictive of disagreements between samplers like DDPM and DDIM. Empirical results across multiple datasets show SD is pervasive and persists despite larger models or more data, while a theoretical framework attributes this to smoothing-based self-guidance across conditioning variables. Toy datasets and analytic results substantiate that self-guidance can cause interpolated flows to deviate from denoising, suggesting a fundamental bias in conditional diffusion that has implications for sampling, distillation, and the interpretation of diffusion-based methods.
Abstract
We study the inductive biases of diffusion models with a conditioning-variable, which have seen widespread application as both text-conditioned generative image models and observation-conditioned continuous control policies. We observe that when these models are queried conditionally, their generations consistently deviate from the idealized "denoising" process upon which diffusion models are formulated, inducing disagreement between popular sampling algorithms (e.g. DDPM, DDIM). We introduce Schedule Deviation, a rigorous measure which captures the rate of deviation from a standard denoising process, and provide a methodology to compute it. Crucially, we demonstrate that the deviation from an idealized denoising process occurs irrespective of the model capacity or amount of training data. We posit that this phenomenon occurs due to the difficulty of bridging distinct denoising flows across different parts of the conditioning space and show theoretically how such a phenomenon can arise through an inductive bias towards smoothness.
