Exposing AI-generated Videos: A Benchmark Dataset and a Local-and-Global Temporal Defect Based Detection Method
Peisong He, Leyao Zhu, Jiaxing Li, Shiqi Wang, Haoliang Li
TL;DR
This work addresses AI-generated video security by (1) building a diverse diffusion-based video dataset with realistic degradations and (2) proposing a detector that exploits both local motion prediction errors and global appearance variation, fused via channel attention. The approach demonstrates strong cross-generator generalization and robustness to video lossy operations, surpassing baselines and providing a concrete benchmark for future video forensics in the AI-generated content era. The dataset and method offer a practical baseline for evaluating and advancing AI-generated video forensics under varied generation and transmission conditions.
Abstract
The generative model has made significant advancements in the creation of realistic videos, which causes security issues. However, this emerging risk has not been adequately addressed due to the absence of a benchmark dataset for AI-generated videos. In this paper, we first construct a video dataset using advanced diffusion-based video generation algorithms with various semantic contents. Besides, typical video lossy operations over network transmission are adopted to generate degraded samples. Then, by analyzing local and global temporal defects of current AI-generated videos, a novel detection framework by adaptively learning local motion information and global appearance variation is constructed to expose fake videos. Finally, experiments are conducted to evaluate the generalization and robustness of different spatial and temporal domain detection methods, where the results can serve as the baseline and demonstrate the research challenge for future studies.
