IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment
Yinan Chen, Jiangning Zhang, Teng Hu, Yuxiang Zeng, Zhucun Xue, Qingdong He, Chengjie Wang, Yong Liu, Xiaobin Hu, Shuicheng Yan
TL;DR
IVEBench tackles the lack of robust evaluation for instruction-guided video editing by introducing a large, diverse benchmark with 600 videos across seven semantic dimensions and 35 edit prompts. It defines a three-dimensional evaluation protocol—Video Quality, Instruction Compliance, and Video Fidelity—augmented with multimodal LLM-based metrics and human alignment. The work demonstrates that current IVE methods struggle with broad task coverage and per-frame fidelity, while showing strong alignment between automatic metrics and human judgments. By open-sourcing the dataset, prompts, and scoring framework, IVEBench aims to standardize and accelerate progress in the field.
Abstract
Instruction-guided video editing has emerged as a rapidly advancing research direction, offering new opportunities for intuitive content transformation while also posing significant challenges for systematic evaluation. Existing video editing benchmarks fail to support the evaluation of instruction-guided video editing adequately and further suffer from limited source diversity, narrow task coverage and incomplete evaluation metrics. To address the above limitations, we introduce IVEBench, a modern benchmark suite specifically designed for instruction-guided video editing assessment. IVEBench comprises a diverse database of 600 high-quality source videos, spanning seven semantic dimensions, and covering video lengths ranging from 32 to 1,024 frames. It further includes 8 categories of editing tasks with 35 subcategories, whose prompts are generated and refined through large language models and expert review. Crucially, IVEBench establishes a three-dimensional evaluation protocol encompassing video quality, instruction compliance and video fidelity, integrating both traditional metrics and multimodal large language model-based assessments. Extensive experiments demonstrate the effectiveness of IVEBench in benchmarking state-of-the-art instruction-guided video editing methods, showing its ability to provide comprehensive and human-aligned evaluation outcomes.
