Table of Contents
Fetching ...

Deepfake Media Generation and Detection in the Generative AI Era: A Survey and Outlook

Florinel-Alin Croitoru, Andrei-Iulian Hiji, Vlad Hondru, Nicolae Catalin Ristea, Paul Irofti, Marius Popescu, Cristian Rusu, Radu Tudor Ionescu, Fahad Shahbaz Khan, Mubarak Shah

TL;DR

This survey comprehensively covers deepfake generation and detection in the era of generative AI, spanning image, video, audio, and multimodal content. It constructs a multi-level taxonomy of generation and detection methods, reviews datasets and benchmark performance, and introduces BioDeepAV, an out-of-domain benchmark to assess generalization to unseen generators. The authors find that state-of-the-art detectors generally struggle to generalize to deepfakes produced by newer models, underscoring a need for robust, explainable detectors and calibrated uncertainty measures. The work provides practical guidance for researchers and developers, including a public project page and a rich set of tutorials to accelerate future advances in robust deepfake detection and generation.

Abstract

With the recent advancements in generative modeling, the realism of deepfake content has been increasing at a steady pace, even reaching the point where people often fail to detect manipulated media content online, thus being deceived into various kinds of scams. In this paper, we survey deepfake generation and detection techniques, including the most recent developments in the field, such as diffusion models and Neural Radiance Fields. Our literature review covers all deepfake media types, comprising image, video, audio and multimodal (audio-visual) content. We identify various kinds of deepfakes, according to the procedure used to alter or generate the fake content. We further construct a taxonomy of deepfake generation and detection methods, illustrating the important groups of methods and the domains where these methods are applied. Next, we gather datasets used for deepfake detection and provide updated rankings of the best performing deepfake detectors on the most popular datasets. In addition, we develop a novel multimodal benchmark to evaluate deepfake detectors on out-of-distribution content. The results indicate that state-of-the-art detectors fail to generalize to deepfake content generated by unseen deepfake generators. Finally, we propose future directions to obtain robust and powerful deepfake detectors. Our project page and new benchmark are available at https://github.com/CroitoruAlin/biodeep.

Deepfake Media Generation and Detection in the Generative AI Era: A Survey and Outlook

TL;DR

This survey comprehensively covers deepfake generation and detection in the era of generative AI, spanning image, video, audio, and multimodal content. It constructs a multi-level taxonomy of generation and detection methods, reviews datasets and benchmark performance, and introduces BioDeepAV, an out-of-domain benchmark to assess generalization to unseen generators. The authors find that state-of-the-art detectors generally struggle to generalize to deepfakes produced by newer models, underscoring a need for robust, explainable detectors and calibrated uncertainty measures. The work provides practical guidance for researchers and developers, including a public project page and a rich set of tutorials to accelerate future advances in robust deepfake detection and generation.

Abstract

With the recent advancements in generative modeling, the realism of deepfake content has been increasing at a steady pace, even reaching the point where people often fail to detect manipulated media content online, thus being deceived into various kinds of scams. In this paper, we survey deepfake generation and detection techniques, including the most recent developments in the field, such as diffusion models and Neural Radiance Fields. Our literature review covers all deepfake media types, comprising image, video, audio and multimodal (audio-visual) content. We identify various kinds of deepfakes, according to the procedure used to alter or generate the fake content. We further construct a taxonomy of deepfake generation and detection methods, illustrating the important groups of methods and the domains where these methods are applied. Next, we gather datasets used for deepfake detection and provide updated rankings of the best performing deepfake detectors on the most popular datasets. In addition, we develop a novel multimodal benchmark to evaluate deepfake detectors on out-of-distribution content. The results indicate that state-of-the-art detectors fail to generalize to deepfake content generated by unseen deepfake generators. Finally, we propose future directions to obtain robust and powerful deepfake detectors. Our project page and new benchmark are available at https://github.com/CroitoruAlin/biodeep.

Paper Structure

This paper contains 49 sections, 8 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: A taxonomy of the state-of-the-art deepfake generation and detection methods. The methods are first divided according to the target task: generation versus detection. For each task, the methods are further divided into different kinds of architectures. For each architecture, we separate the methods based on the media types. Large groups are further divided according to the deepfake types presented in Section \ref{['sec_deepfake_types']}. References are clickable links to papers. Best viewed in color.
  • Figure 2: Deepfake types according to the general procedure used to synthesize the fake content. For deepfake types that apply to multiple domains, we provide the illustration for only one domain. Best viewed in color.
  • Figure 3: Randomly sampled frames captured from the fake videos included in BioDeepAV exhibit a high level of realism. Best viewed in color.
  • Figure 4: An overview of face swapping based on GANs. The generative process is conditioned on an identity encoder $I$ and an attribute encoder $A$, aiming to preserve the target attributes while replacing the target identity with the source identity.
  • Figure 5: An overview of face synthesis based on VAEs. The KL divergence is used to minimize the distribution gap between the distribution of $z$ and the standard Gaussian distribution $p(z)$.
  • ...and 2 more figures