Table of Contents
Fetching ...

Beyond Deepfake Images: Detecting AI-Generated Videos

Danial Samadi Vahdati, Tai D. Nguyen, Aref Azizpour, Matthew C. Stamm

TL;DR

The paper shows that detectors trained on synthetic images do not reliably detect AI-generated videos because video traces differ fundamentally from image traces. It demonstrates that synthetic video forensic traces can be learned using CNNs to achieve accurate video detection and generator attribution, even after H.264 re-compression, and that video-level aggregation boosts performance. While zero-shot transfer to new generators is challenging, few-shot fine-tuning with limited data enables rapid adaptation to emerging generators. These findings have practical implications for media provenance, platform moderation, and the deployment of robust video forensics in dynamic threat environments.

Abstract

Recent advances in generative AI have led to the development of techniques to generate visually realistic synthetic video. While a number of techniques have been developed to detect AI-generated synthetic images, in this paper we show that synthetic image detectors are unable to detect synthetic videos. We demonstrate that this is because synthetic video generators introduce substantially different traces than those left by image generators. Despite this, we show that synthetic video traces can be learned, and used to perform reliable synthetic video detection or generator source attribution even after H.264 re-compression. Furthermore, we demonstrate that while detecting videos from new generators through zero-shot transferability is challenging, accurate detection of videos from a new generator can be achieved through few-shot learning.

Beyond Deepfake Images: Detecting AI-Generated Videos

TL;DR

The paper shows that detectors trained on synthetic images do not reliably detect AI-generated videos because video traces differ fundamentally from image traces. It demonstrates that synthetic video forensic traces can be learned using CNNs to achieve accurate video detection and generator attribution, even after H.264 re-compression, and that video-level aggregation boosts performance. While zero-shot transfer to new generators is challenging, few-shot fine-tuning with limited data enables rapid adaptation to emerging generators. These findings have practical implications for media provenance, platform moderation, and the deployment of robust video forensics in dynamic threat environments.

Abstract

Recent advances in generative AI have led to the development of techniques to generate visually realistic synthetic video. While a number of techniques have been developed to detect AI-generated synthetic images, in this paper we show that synthetic image detectors are unable to detect synthetic videos. We demonstrate that this is because synthetic video generators introduce substantially different traces than those left by image generators. Despite this, we show that synthetic video traces can be learned, and used to perform reliable synthetic video detection or generator source attribution even after H.264 re-compression. Furthermore, we demonstrate that while detecting videos from new generators through zero-shot transferability is challenging, accurate detection of videos from a new generator can be achieved through few-shot learning.
Paper Structure (15 sections, 2 equations, 4 figures, 9 tables)

This paper contains 15 sections, 2 equations, 4 figures, 9 tables.

Figures (4)

  • Figure 1: Fourier transform analysis of the forensic traces extracted from different synthetic image and video generators.
  • Figure 2: Detection performance of MISLnetopenset before and after robust-training on videos with constant rate factors from 0 to 40.
  • Figure 3: Video-level performance of MISLnet over different number of patches used for obtaining video-level detection score.
  • Figure 4: Relative Error Reduction in video-level performance versus frame-level performance of MISLnet over different number of patches used for obtaining video-level detection score.