Shot Sequence Ordering for Video Editing: Benchmarks, Metrics, and Cinematology-Inspired Computing Methods
Yuzhi Li, Haojun Xu, Feng Tian
TL;DR
This work addresses the lack of public benchmarks for Shot Sequence Ordering (SSO) in AI-assisted video editing by introducing AVE-Order and ActivityNet-Order, two datasets with complete shot sequences and labels. It advances SSO evaluation by adopting Kendall Tau distance and proposes a Kendall Tau Distance-Cross Entropy Loss (KTD-CE) that blends classification and ordering signals. The authors further introduce Cinematology Embedding to inject film-grammars through movie metadata and shot-type priors via a Video Transformer framework, and create AVE-Meta for metadata linkage. Experimental results on both benchmarks demonstrate that KTD-CE and Cinematology Embedding substantially improve SSO accuracy, and ablations confirm the contributions of each component. Overall, the work provides public datasets, a principled ordering metric, a specialized loss, and metadata-informed modeling to push forward professional-level SSO in AI-assisted video editing.
Abstract
With the rising popularity of short video platforms, the demand for video production has increased substantially. However, high-quality video creation continues to rely heavily on professional editing skills and a nuanced understanding of visual language. To address this challenge, the Shot Sequence Ordering (SSO) task in AI-assisted video editing has emerged as a pivotal approach for enhancing video storytelling and the overall viewing experience. Nevertheless, the progress in this field has been impeded by a lack of publicly available benchmark datasets. In response, this paper introduces two novel benchmark datasets, AVE-Order and ActivityNet-Order. Additionally, we employ the Kendall Tau distance as an evaluation metric for the SSO task and propose the Kendall Tau Distance-Cross Entropy Loss. We further introduce the concept of Cinematology Embedding, which incorporates movie metadata and shot labels as prior knowledge into the SSO model, and constructs the AVE-Meta dataset to validate the method's effectiveness. Experimental results indicate that the proposed loss function and method substantially enhance SSO task accuracy. All datasets are publicly accessible at https://github.com/litchiar/ShotSeqBench.
