PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing
Feng Tian, Yixuan Li, Yichao Yan, Shanyan Guan, Yanhao Ge, Xiaokang Yang
TL;DR
PostEdit tackles three core image editing challenges: controllability, background preservation, and efficiency. It reframes editing as posterior sampling in diffusion models, leveraging a measurement term that encodes the initial image features to steer sampling toward the target prompt while preserving unedited regions. The method is inversion-free and training-free, uses a latent-space optimization with Langevin dynamics and a weighted fusion with the input latent, and achieves fast inference around 1.5 seconds with state-of-the-art editing performance on PIE-Bench. The approach offers practical impact for interactive image editing, enabling high-quality edits with strong background fidelity across diverse scenes.
Abstract
In the field of image editing, three core challenges persist: controllability, background preservation, and efficiency. Inversion-based methods rely on time-consuming optimization to preserve the features of the initial images, which results in low efficiency due to the requirement for extensive network inference. Conversely, inversion-free methods lack theoretical support for background similarity, as they circumvent the issue of maintaining initial features to achieve efficiency. As a consequence, none of these methods can achieve both high efficiency and background consistency. To tackle the challenges and the aforementioned disadvantages, we introduce PostEdit, a method that incorporates a posterior scheme to govern the diffusion sampling process. Specifically, a corresponding measurement term related to both the initial features and Langevin dynamics is introduced to optimize the estimated image generated by the given target prompt. Extensive experimental results indicate that the proposed PostEdit achieves state-of-the-art editing performance while accurately preserving unedited regions. Furthermore, the method is both inversion- and training-free, necessitating approximately 1.5 seconds and 18 GB of GPU memory to generate high-quality results.
