Detecting AI-Generated Video via Frame Consistency
Long Ma, Zhiyuan Yan, Qinglang Guo, Yong Liao, Haiyang Yu, Pengyuan Zhou
TL;DR
The paper tackles AI-generated video detection by introducing the GVF dataset, a comprehensive benchmark spanning prompts, real/fake video pairs, and multiple generation models. It identifies the limitations of spatial-artifact detectors and proposes DeCoF, a temporal-artifact detector that maps frames to a semantic space via ViT-L/14 and uses a transformer-based verifier to learn frame consistency. Across unseen generation models, DeCoF achieves strong generalization and robustness, outperforming existing detectors in ACC and AUC. This work provides a valuable dataset and a scalable temporal-forensics method to combat disinformation and support media authentication in real-world scenarios.
Abstract
The escalating quality of video generated by advanced video generation methods results in new security challenges, while there have been few relevant research efforts: 1) There is no open-source dataset for generated video detection, 2) No generated video detection method has been proposed so far. To this end, we propose an open-source dataset and a detection method for generated video for the first time. First, we propose a scalable dataset consisting of 964 prompts, covering various forgery targets, scenes, behaviors, and actions, as well as various generation models with different architectures and generation methods, including the most popular commercial models like OpenAI's Sora and Google's Veo. Second, we found via probing experiments that spatial artifact-based detectors lack generalizability. Hence, we propose a simple yet effective \textbf{de}tection model based on \textbf{f}rame \textbf{co}nsistency (\textbf{DeCoF}), which focuses on temporal artifacts by eliminating the impact of spatial artifacts during feature learning. Extensive experiments demonstrate the efficacy of DeCoF in detecting videos generated by unseen video generation models and confirm its powerful generalizability across several commercially proprietary models.
