Analyzing and Guiding Zero-Shot Posterior Sampling in Diffusion Models
Roi Benita, Michael Elad, Joseph Keshet
TL;DR
This work analyzes zero-shot posterior sampling in diffusion models for linear inverse problems through a spectral lens under a Gaussian prior. It derives closed-form expressions for both the ideal posterior sampler and training-free reconstruction methods, enabling principled comparisons in the spectral domain and a method-agnostic framework for weighting guidance terms. The authors formulate an optimization based on averaged Wasserstein distance to design optimal guidance weights, and provide closed-form, per-frequency transfer functions that decouple across frequencies. Empirical results on synthetic Gaussian data and real datasets (FFHQ and ImageNet) show that the spectral recommendations offer a more balanced trade-off between measurement fidelity and perceptual quality than common heuristics, while reducing per-instance tuning requirements and adapting to diffusion step size.
Abstract
Recovering a signal from its degraded measurements is a long standing challenge in science and engineering. Recently, zero-shot diffusion based methods have been proposed for such inverse problems, offering a posterior sampling based solution that leverages prior knowledge. Such algorithms incorporate the observations through inference, often leaning on manual tuning and heuristics. In this work we propose a rigorous analysis of such approximate posterior-samplers, relying on a Gaussianity assumption of the prior. Under this regime, we show that both the ideal posterior sampler and diffusion-based reconstruction algorithms can be expressed in closed-form, enabling their thorough analysis and comparisons in the spectral domain. Building on these representations, we also introduce a principled framework for parameter design, replacing heuristic selection strategies used to date. The proposed approach is method-agnostic and yields tailored parameter choices for each algorithm, jointly accounting for the characteristics of the prior, the degraded signal, and the diffusion dynamics. We show that our spectral recommendations differ structurally from standard heuristics and vary with the diffusion step size, resulting in a consistent balance between perceptual quality and signal fidelity.
