Table of Contents
Fetching ...

Chameleon: On the Scene Diversity and Domain Variety of AI-Generated Videos Detection

Meiyu Zeng, Xingming Liao, Canyu Chen, Nankai Lin, Zhuowei Wang, Chong Chen, Aimin Yang

TL;DR

Chameleon introduces a diverse AI-generated video dataset that incorporates scene transitions and broad domain variety by fusing multi-source real videos with multiple generation tools. It offers a rigorous framework for AI-generated video detection and backtracking source retrieval, evaluated against both CNN-based detectors and large vision models. Findings show strong performance for traditional deep-learning detectors (e.g., NPR) and highlight generalization gaps for large language-vision models, while backtracking-based source tracing proves robust with suitable backbones. The dataset and benchmarks aim to better reflect real-world forensic needs and support development of more reliable detectors across complex, dynamic scenes.

Abstract

Artificial intelligence generated content (AIGC), known as DeepFakes, has emerged as a growing concern because it is being utilized as a tool for spreading disinformation. While much research exists on identifying AI-generated text and images, research on detecting AI-generated videos is limited. Existing datasets for AI-generated videos detection exhibit limitations in terms of diversity, complexity, and realism. To address these issues, this paper focuses on AI-generated videos detection and constructs a diverse dataset named Chameleon. We generate videos through multiple generation tools and various real video sources. At the same time, we preserve the videos' real-world complexity, including scene switches and dynamic perspective changes, and expand beyond face-centered detection to include human actions and environment generation. Our work bridges the gap between AI-generated dataset construction and real-world forensic needs, offering a valuable benchmark to counteract the evolving threats of AI-generated content.

Chameleon: On the Scene Diversity and Domain Variety of AI-Generated Videos Detection

TL;DR

Chameleon introduces a diverse AI-generated video dataset that incorporates scene transitions and broad domain variety by fusing multi-source real videos with multiple generation tools. It offers a rigorous framework for AI-generated video detection and backtracking source retrieval, evaluated against both CNN-based detectors and large vision models. Findings show strong performance for traditional deep-learning detectors (e.g., NPR) and highlight generalization gaps for large language-vision models, while backtracking-based source tracing proves robust with suitable backbones. The dataset and benchmarks aim to better reflect real-world forensic needs and support development of more reliable detectors across complex, dynamic scenes.

Abstract

Artificial intelligence generated content (AIGC), known as DeepFakes, has emerged as a growing concern because it is being utilized as a tool for spreading disinformation. While much research exists on identifying AI-generated text and images, research on detecting AI-generated videos is limited. Existing datasets for AI-generated videos detection exhibit limitations in terms of diversity, complexity, and realism. To address these issues, this paper focuses on AI-generated videos detection and constructs a diverse dataset named Chameleon. We generate videos through multiple generation tools and various real video sources. At the same time, we preserve the videos' real-world complexity, including scene switches and dynamic perspective changes, and expand beyond face-centered detection to include human actions and environment generation. Our work bridges the gap between AI-generated dataset construction and real-world forensic needs, offering a valuable benchmark to counteract the evolving threats of AI-generated content.

Paper Structure

This paper contains 34 sections, 16 equations, 11 figures, 3 tables, 1 algorithm.

Figures (11)

  • Figure 1: Examples of Chameleon. Frame sequences are used to show the dynamics of the video. Subfigure (a) represents real-world videos from News, Speech, and Recommendation, and the second column on the left represents AI-generated videos based on the first column. Subfigure (b) shows the scene diversity and domain diversity of Chameleon. The green arrows for scene diversity and the yellow boxes show domain diversity.
  • Figure 2: Framework building for Chameleon. Blue for AI-generated videos and black for real-world videos. A cross and a check mark indicate that the video is judged to be an AI-generated video and a real-world video, respectively.
  • Figure 3: ROC curve compares the performance of different models in detecting AI-generated and real-world videos.
  • Figure 4: Accuracy of different methods across categories and generation techniques with confidence thresholds equal to 1.0.
  • Figure 5: Examples of LVMs detection of AI-generated and real-world video frames in the Chameleon dataset. The left section presents detection results for AI-generated video frames, while the right section shows detection results for real-world video frames. Each section includes three categories: News, Recommendation, and Speech. The green checkmarks indicate correct predictions, while the red crosses represent incorrect predictions.
  • ...and 6 more figures