Table of Contents
Fetching ...

VeloEdit: Training-Free Consistent and Continuous Instruction-Based Image Editing via Velocity Field Decomposition

Zongqing Li, Zhihui Liu, Yujie Xie, Shansiyuan Wu, Hongshen Lv, Songzhi Su

Abstract

Instruction-based image editing aims to modify source content according to textual instructions. However, existing methods built upon flow matching often struggle to maintain consistency in non-edited regions due to denoising-induced reconstruction errors that cause drift in preserved content. Moreover, they typically lack fine-grained control over edit strength. To address these limitations, we propose VeloEdit, a training-free method that enables highly consistent and continuously controllable editing. VeloEdit dynamically identifies editing regions by quantifying the discrepancy between the velocity fields responsible for preserving source content and those driving the desired edits. Based on this partition, we enforce consistency in preservation regions by substituting the editing velocity with the source-restoring velocity, while enabling continuous modulation of edit intensity in target regions via velocity interpolation. Unlike prior works that rely on complex attention manipulation or auxiliary trainable modules, VeloEdit operates directly on the velocity fields. Extensive experiments on Flux.1 Kontext and Qwen-Image-Edit demonstrate that VeloEdit improves visual consistency and editing continuity with negligible additional computational cost. Code is available at https://github.com/xmulzq/VeloEdit.

VeloEdit: Training-Free Consistent and Continuous Instruction-Based Image Editing via Velocity Field Decomposition

Abstract

Instruction-based image editing aims to modify source content according to textual instructions. However, existing methods built upon flow matching often struggle to maintain consistency in non-edited regions due to denoising-induced reconstruction errors that cause drift in preserved content. Moreover, they typically lack fine-grained control over edit strength. To address these limitations, we propose VeloEdit, a training-free method that enables highly consistent and continuously controllable editing. VeloEdit dynamically identifies editing regions by quantifying the discrepancy between the velocity fields responsible for preserving source content and those driving the desired edits. Based on this partition, we enforce consistency in preservation regions by substituting the editing velocity with the source-restoring velocity, while enabling continuous modulation of edit intensity in target regions via velocity interpolation. Unlike prior works that rely on complex attention manipulation or auxiliary trainable modules, VeloEdit operates directly on the velocity fields. Extensive experiments on Flux.1 Kontext and Qwen-Image-Edit demonstrate that VeloEdit improves visual consistency and editing continuity with negligible additional computational cost. Code is available at https://github.com/xmulzq/VeloEdit.
Paper Structure (26 sections, 13 equations, 21 figures, 7 tables, 1 algorithm)

This paper contains 26 sections, 13 equations, 21 figures, 7 tables, 1 algorithm.

Figures (21)

  • Figure 1: VeloEdit constructs continuous editing trajectories for instruction-based image editing models. Our method empowers these models to achieve continuous and consistent control over edit effects without additional training.
  • Figure 2: Masks derived via the velocity field. These masks separate preservation regions from editing regions, serving as the foundation for enhancing consistency and enabling precise control over editing intensity.
  • Figure 3: Impact of early velocity replacement. Intervening in the initial one or two timesteps completely suppresses the editing effect.
  • Figure 4: Overview of the proposed pipeline. We derive a spatial mask by analyzing the velocity discrepancy between preservation and editing flows. Our method explicitly preserves high similarity regions while blending low similarity regions, thereby yielding a sequence of edited results with smooth semantic transitions.
  • Figure 5: Editing results of the high similarity velocity replacement strategy. By substituting predicted velocities in high similarity regions($S_t > \tau$) with the preservation velocity, VeloEdit effectively maintains the structural integrity of non-edited regions.
  • ...and 16 more figures