MagicQuill: An Intelligent Interactive Image Editing System
Zichen Liu, Yue Yu, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Wen Wang, Zhiheng Liu, Qifeng Chen, Yujun Shen
TL;DR
MagicQuill tackles the challenge of precise, interactive image editing by integrating a dual-branch diffusion-based Editing Processor with a painting-intent predicting Painting Assistor and a cross-platform Idea Collector UI. The Edit Processor provides edge- and color-guided control, while the Draw&Guess-based MLLM predicts contextually appropriate prompts to minimize manual input. Key contributions include a dedicated Draw&Guess dataset with LoRA-fine-tuned MLLMs, a plug-and-play editing toolkit compatible with multiple SD weights, and comprehensive user studies showing improved precision, efficiency, and usability over baselines. The work demonstrates strong generalization across fine-tuned diffusion models and validates the practicality of an open-source, interactive editing framework for creative workflows.
Abstract
Image editing involves a variety of complex tasks and requires efficient and precise manipulation techniques. In this paper, we present MagicQuill, an integrated image editing system that enables swift actualization of creative ideas. Our system features a streamlined yet functionally robust interface, allowing for the articulation of editing operations (e.g., inserting elements, erasing objects, altering color) with minimal input. These interactions are monitored by a multimodal large language model (MLLM) to anticipate editing intentions in real time, bypassing the need for explicit prompt entry. Finally, we apply a powerful diffusion prior, enhanced by a carefully learned two-branch plug-in module, to process editing requests with precise control. Experimental results demonstrate the effectiveness of MagicQuill in achieving high-quality image edits. Please visit https://magic-quill.github.io to try out our system.
