InstructVEdit: A Holistic Approach for Instructional Video Editing

Chi Zhang; Chengjian Feng; Feng Yan; Qiming Zhang; Mingjin Zhang; Yujie Zhong; Jing Zhang; Lin Ma

InstructVEdit: A Holistic Approach for Instructional Video Editing

Chi Zhang, Chengjian Feng, Feng Yan, Qiming Zhang, Mingjin Zhang, Yujie Zhong, Jing Zhang, Lin Ma

TL;DR

Video instruction-guided editing has been hampered by scarce high-quality paired data. InstructVEdit delivers a holistic solution with a dataset curation workflow that leverages image editing data, and two model innovations—Soft Motion Adapter (SMA) and Editing-guided Propagation Module (EPM)—to improve edit fidelity while maintaining temporal coherence. A multi-round iterative refinement strategy incorporates real-world video data to bridge synthetic-real domain gaps, delivering state-of-the-art results on TGVE and TGVE+ benchmarks and strong user-preference signals. The approach offers a scalable, practical pipeline for instruction-based video editing, reducing data requirements while enhancing generalization to real-world scenarios.

Abstract

Video editing according to instructions is a highly challenging task due to the difficulty in collecting large-scale, high-quality edited video pair data. This scarcity not only limits the availability of training data but also hinders the systematic exploration of model architectures and training strategies. While prior work has improved specific aspects of video editing (e.g., synthesizing a video dataset using image editing techniques or decomposed video editing training), a holistic framework addressing the above challenges remains underexplored. In this study, we introduce InstructVEdit, a full-cycle instructional video editing approach that: (1) establishes a reliable dataset curation workflow to initialize training, (2) incorporates two model architectural improvements to enhance edit quality while preserving temporal consistency, and (3) proposes an iterative refinement strategy leveraging real-world data to enhance generalization and minimize train-test discrepancies. Extensive experiments show that InstructVEdit achieves state-of-the-art performance in instruction-based video editing, demonstrating robust adaptability to diverse real-world scenarios. Project page: https://o937-blip.github.io/InstructVEdit.

InstructVEdit: A Holistic Approach for Instructional Video Editing

TL;DR

Abstract

InstructVEdit: A Holistic Approach for Instructional Video Editing

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)