Towards a Mechanistic Explanation of Diffusion Model Generalization
Matthew Niedoba, Berend Zwartsenberg, Kevin Murphy, Frank Wood
TL;DR
<3-5 sentence high-level summary> The paper investigates why diffusion models generalize well beyond their training data by contrasting pretrained network denoisers with the empirically optimal denoiser, uncovering a persistent local inductive bias across architectures. It posits that neural denoisers operate via localized denoising that, when aggregated across patches, approximates the global optimal denoiser for much of the forward process. The authors formalize this intuition with Patch Set Posterior Composite (PSPC) denoisers, including PSPC-Square and PSPC-Flex, which compute patch posterior means over spatial crops and combine them to match network outputs. PSPC exhibits strong alignment with network denoisers in forward and reverse diffusion, yielding samples that resemble neural-network outputs while remaining training-free, with implications for attribution, efficiency, and non-neural diffusion strategies.
Abstract
We propose a simple, training-free mechanism which explains the generalization behaviour of diffusion models. By comparing pre-trained diffusion models to their theoretically optimal empirical counterparts, we identify a shared local inductive bias across a variety of network architectures. From this observation, we hypothesize that network denoisers generalize through localized denoising operations, as these operations approximate the training objective well over much of the training distribution. To validate our hypothesis, we introduce novel denoising algorithms which aggregate local empirical denoisers to replicate network behaviour. Comparing these algorithms to network denoisers across forward and reverse diffusion processes, our approach exhibits consistent visual similarity to neural network outputs, with lower mean squared error than previously proposed methods.
