Chameleon: On the Scene Diversity and Domain Variety of AI-Generated Videos Detection
Meiyu Zeng, Xingming Liao, Canyu Chen, Nankai Lin, Zhuowei Wang, Chong Chen, Aimin Yang
TL;DR
Chameleon introduces a diverse AI-generated video dataset that incorporates scene transitions and broad domain variety by fusing multi-source real videos with multiple generation tools. It offers a rigorous framework for AI-generated video detection and backtracking source retrieval, evaluated against both CNN-based detectors and large vision models. Findings show strong performance for traditional deep-learning detectors (e.g., NPR) and highlight generalization gaps for large language-vision models, while backtracking-based source tracing proves robust with suitable backbones. The dataset and benchmarks aim to better reflect real-world forensic needs and support development of more reliable detectors across complex, dynamic scenes.
Abstract
Artificial intelligence generated content (AIGC), known as DeepFakes, has emerged as a growing concern because it is being utilized as a tool for spreading disinformation. While much research exists on identifying AI-generated text and images, research on detecting AI-generated videos is limited. Existing datasets for AI-generated videos detection exhibit limitations in terms of diversity, complexity, and realism. To address these issues, this paper focuses on AI-generated videos detection and constructs a diverse dataset named Chameleon. We generate videos through multiple generation tools and various real video sources. At the same time, we preserve the videos' real-world complexity, including scene switches and dynamic perspective changes, and expand beyond face-centered detection to include human actions and environment generation. Our work bridges the gap between AI-generated dataset construction and real-world forensic needs, offering a valuable benchmark to counteract the evolving threats of AI-generated content.
