Edicho: Consistent Image Editing in the Wild
Qingyan Bai, Hao Ouyang, Yinghao Xu, Qiuyu Wang, Ceyuan Yang, Ka Leong Cheng, Yujun Shen, Qifeng Chen
TL;DR
Edicho tackles inconsistent cross-image edits in uncontrolled real-world images by introducing explicit correspondence into diffusion-based editing. It combines Corr-Attention and Corr-CFG to steer denoising with pre-estimated image correspondences, enabling training-free, plug-and-play edits that generalize across images and editing tasks. Quantitative and qualitative results show superior text alignment and editing consistency over strong baselines, along with practical applications in customization and 3D reconstruction. The approach preserves pre-trained generative priors and demonstrates robust performance in diverse, in-the-wild scenarios, with limitations mainly arising from correlation misalignment and potential texture distortions to be mitigated with better extractors.
Abstract
As a verified need, consistent editing across in-the-wild images remains a technical challenge arising from various unmanageable factors, like object poses, lighting conditions, and photography environments. Edicho steps in with a training-free solution based on diffusion models, featuring a fundamental design principle of using explicit image correspondence to direct editing. Specifically, the key components include an attention manipulation module and a carefully refined classifier-free guidance (CFG) denoising strategy, both of which take into account the pre-estimated correspondence. Such an inference-time algorithm enjoys a plug-and-play nature and is compatible to most diffusion-based editing methods, such as ControlNet and BrushNet. Extensive results demonstrate the efficacy of Edicho in consistent cross-image editing under diverse settings. We will release the code to facilitate future studies.
