Table of Contents
Fetching ...

Guess & Guide: Gradient-Free Zero-Shot Diffusion Guidance

Abduragim Shtanchaev, Albina Ilina, Yazid Janati, Arip Asadulaev, Martin Takác, Eric Moulines

TL;DR

This work proposes the fastest and Pareto optimal method for Bayesian inverse problems by introducing a lightweight likelihood surrogate that eliminates the need to calculate gradients through the denoiser network.

Abstract

Pretrained diffusion models serve as effective priors for Bayesian inverse problems. These priors enable zero-shot generation by sampling from the conditional distribution, which avoids the need for task-specific retraining. However, a major limitation of existing methods is their reliance on surrogate likelihoods that require vector-Jacobian products at each denoising step, creating a substantial computational burden. To address this, we introduce a lightweight likelihood surrogate that eliminates the need to calculate gradients through the denoiser network. This enables us to handle diverse inverse problems without backpropagation overhead. Experiments confirm that using our method, the inference cost drops dramatically. At the same time, our approach delivers the highest results in multiple tasks. Broadly speaking, we propose the fastest and Pareto optimal method for Bayesian inverse problems.

Guess & Guide: Gradient-Free Zero-Shot Diffusion Guidance

TL;DR

This work proposes the fastest and Pareto optimal method for Bayesian inverse problems by introducing a lightweight likelihood surrogate that eliminates the need to calculate gradients through the denoiser network.

Abstract

Pretrained diffusion models serve as effective priors for Bayesian inverse problems. These priors enable zero-shot generation by sampling from the conditional distribution, which avoids the need for task-specific retraining. However, a major limitation of existing methods is their reliance on surrogate likelihoods that require vector-Jacobian products at each denoising step, creating a substantial computational burden. To address this, we introduce a lightweight likelihood surrogate that eliminates the need to calculate gradients through the denoiser network. This enables us to handle diverse inverse problems without backpropagation overhead. Experiments confirm that using our method, the inference cost drops dramatically. At the same time, our approach delivers the highest results in multiple tasks. Broadly speaking, we propose the fastest and Pareto optimal method for Bayesian inverse problems.
Paper Structure (52 sections, 7 theorems, 45 equations, 21 figures, 14 tables, 1 algorithm)

This paper contains 52 sections, 7 theorems, 45 equations, 21 figures, 14 tables, 1 algorithm.

Key Result

Proposition 2.2

Assume the Gaussian likelihood in eq:app_posterior_target and Assumption ass:local_gaussian. Then the maximizer of the approximate conditional posterior $q_t(\mathbf{x} \mid \mathbf{z}_t, \mathbf{y}) \propto \exp\!\left(-\frac{1}{2\sigma_y^2}\|\mathbf{y}-\mathcal{A}(\mathbf{x})\|^2\right)\, q_t(\mat

Figures (21)

  • Figure 1: FFHQ and ImageNet Qualitative results. Comparison against baseline methods over different tasks: Gaussian Deblurring and Center Inpainting on FFHQ (top and middle rows respectively), and Motion Deblurring on ImageNet (bottom row). Guess and Guide (ours) achieves more accurate and detailed restoration compared to existing approaches. See Appendix \ref{['sec:images_ffhq']}/\ref{['sec:images_imagenet']} for more samples.
  • Figure 2: Impact of initial timestep $t_*$ on the performance of G&G method. The first row of images shows visual reconstruction quality for different $t_*$ values, and the second row shows what the initial guess looks like. Plots display the variance of metrics and runtime at different $t_*$ values and the standard deviation of metrics.
  • Figure 3: Reconstructions for half mask inpainting on FFHQ dataset.
  • Figure 4: JPEG dequantization with QF = 2 on FFHQ dataset.
  • Figure 5: Reconstructions for Gaussian deblurring on ImageNet dataset.
  • ...and 16 more figures

Theorems & Definitions (20)

  • Proposition 2.2: Pixel-space objective as approximate conditional MAP
  • proof
  • Corollary 2.3: Proximal interpretation
  • proof
  • Remark 2.4: Linear closed form
  • Remark 2.5: Stability
  • Remark 2.6: Scaling $\lambda_t$
  • Remark 2.7: Guidance schedule
  • Remark 2.8: Phase 1 as repeated approximate conditioning at $t_*$
  • Remark 2.9: How to read this subsection
  • ...and 10 more