Image Inpainting Models are Effective Tools for Instruction-guided Image Editing
Xuan Ju, Junhao Zhuang, Zhaoyang Zhang, Yuxuan Bian, Qiang Xu, Ying Shan
TL;DR
Instruction-guided image editing remains challenging when relying on jointly trained LLM-diffusion models or data-heavy fine-tuning. The paper proposes IIIE, a 4-step Inpainting-based Instruction-guided Image Editing pipeline that decouples language understanding from generation by using an LLM to derive editing category, target object, masks, and prompts, followed by localized inpainting-based edits. By avoiding joint fine-tuning and leveraging intermediary guidance, IIIE achieves higher faithfulness, content preservation, and instruction following on the MAGIC benchmark than prior methods. The approach demonstrates that simple, modular integration of LLMs with image inpainting can yield strong editing performance with scalable, explainable steps and provides open-source resources for replication.
Abstract
This is the technique report for the winning solution of the CVPR2024 GenAI Media Generation Challenge Workshop's Instruction-guided Image Editing track. Instruction-guided image editing has been largely studied in recent years. The most advanced methods, such as SmartEdit and MGIE, usually combine large language models with diffusion models through joint training, where the former provides text understanding ability, and the latter provides image generation ability. However, in our experiments, we find that simply connecting large language models and image generation models through intermediary guidance such as masks instead of joint fine-tuning leads to a better editing performance and success rate. We use a 4-step process IIIE (Inpainting-based Instruction-guided Image Editing): editing category classification, main editing object identification, editing mask acquisition, and image inpainting. Results show that through proper combinations of language models and image inpainting models, our pipeline can reach a high success rate with satisfying visual quality.
