RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing

Tianyuan Qu; Lei Ke; Xiaohang Zhan; Longxiang Tang; Yuqi Liu; Bohao Peng; Bei Yu; Dong Yu; Jiaya Jia

RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing

Tianyuan Qu, Lei Ke, Xiaohang Zhan, Longxiang Tang, Yuqi Liu, Bohao Peng, Bei Yu, Dong Yu, Jiaya Jia

TL;DR

IV-Complexity captures the challenge of performing precise edits in cluttered scenes under complex instructions. RePlan introduces a region-grounded, plan–execute framework that couples a vision–language planner with a diffusion editor, using a training-free attention region injection and GRPO reinforcement learning with ~1k instruction-only examples. It also proposes IV-Edit, a benchmark designed to stress fine-grained grounding and knowledge-driven edits. Across IV-Complexity settings, RePlan achieves superior regional precision and fidelity compared with data-hungry baselines, enabling efficient multi-region edits in a single pass. This work advances controllable, knowledge-aware image editing in realistic, complex scenes.

Abstract

Instruction-based image editing enables natural-language control over visual modifications, yet existing models falter under Instruction-Visual Complexity (IV-Complexity), where intricate instructions meet cluttered or ambiguous scenes. We introduce RePlan (Region-aligned Planning), a plan-then-execute framework that couples a vision-language planner with a diffusion editor. The planner decomposes instructions via step-by-step reasoning and explicitly grounds them to target regions; the editor then applies changes using a training-free attention-region injection mechanism, enabling precise, parallel multi-region edits without iterative inpainting. To strengthen planning, we apply GRPO-based reinforcement learning using 1K instruction-only examples, yielding substantial gains in reasoning fidelity and format reliability. We further present IV-Edit, a benchmark focused on fine-grained grounding and knowledge-intensive edits. Across IV-Complex settings, RePlan consistently outperforms strong baselines trained on far larger datasets, improving regional precision and overall fidelity. Our project page: https://replan-iv-edit.github.io

RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing

TL;DR

Abstract

RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (15)