Efficient Zero-Shot Inpainting with Decoupled Diffusion Guidance
Badr Moufad, Navid Bagheri Shouraki, Alain Oliviero Durmus, Thomas Hirtz, Eric Moulines, Jimmy Olsson, Yazid Janati
TL;DR
This work tackles zero-shot inpainting using pretrained diffusion priors by introducing Decoupled INpainting Guidance (DInG), a VJP-free method that yields exact Gaussian posterior transitions via decoupled likelihood surrogates in latent space. By evaluating the likelihood on an independent proxy and leveraging Gaussian conjugacy, DInG avoids backpropagation through the denoiser while maintaining high fidelity to observed regions and realistic completions, especially under low NFE budgets. Across FFHQ, DIV2K, and PIE-Bench, DInG outperforms state-of-the-art zero-shot baselines and even surpasses a finetuned SD3 model for editing tasks, demonstrating strong observation consistency with efficient inference. The approach offers a practical, memory-efficient path to high-quality zero-shot inpainting using latent diffusion models, with broad implications for real-time image editing and restoration.
Abstract
Diffusion models have emerged as powerful priors for image editing tasks such as inpainting and local modification, where the objective is to generate realistic content that remains consistent with observed regions. In particular, zero-shot approaches that leverage a pretrained diffusion model, without any retraining, have been shown to achieve highly effective reconstructions. However, state-of-the-art zero-shot methods typically rely on a sequence of surrogate likelihood functions, whose scores are used as proxies for the ideal score. This procedure however requires vector-Jacobian products through the denoiser at every reverse step, introducing significant memory and runtime overhead. To address this issue, we propose a new likelihood surrogate that yields simple and efficient to sample Gaussian posterior transitions, sidestepping the backpropagation through the denoiser network. Our extensive experiments show that our method achieves strong observation consistency compared with fine-tuned baselines and produces coherent, high-quality reconstructions, all while significantly reducing inference cost. Code is available at https://github.com/YazidJanati/ding.
