Table of Contents
Fetching ...

Passive Deepfake Detection Across Multi-modalities: A Comprehensive Survey

Hong-Hanh Nguyen-Le, Van-Tuan Tran, Dinh-Thuc Nguyen, Nhien-An Le-Khac

TL;DR

This survey addresses the problem of passive deepfake detection across image, video, audio, and multimodal modalities, extending beyond detection accuracy to generalization, robustness, attribution, and real-world resilience. It provides a cross-modality taxonomy of unimodal and multimodal approaches, reviews datasets and benchmarks (e.g., FF++, DFDC, DF40, DeepfakeBench, VoiceWukong), and analyzes the strengths and limitations of current methods. Key contributions include an extended evaluation framework, identification of dataset gaps, and guidance on future directions to enable robust deployment in real-world platforms. The work aims to equip researchers and practitioners with a comprehensive resource for understanding current methods, deployment challenges, and promising avenues for future DF detection research.

Abstract

In recent years, deepfakes (DFs) have been utilized for malicious purposes, such as individual impersonation, misinformation spreading, and artists style imitation, raising questions about ethical and security concerns. In this survey, we provide a comprehensive review and comparison of passive DF detection across multiple modalities, including image, video, audio, and multi-modal, to explore the inter-modality relationships between them. Beyond detection accuracy, we extend our analysis to encompass crucial performance dimensions essential for real-world deployment: generalization capabilities across novel generation techniques, robustness against adversarial manipulations and postprocessing techniques, attribution precision in identifying generation sources, and resilience under real-world operational conditions. Additionally, we analyze the advantages and limitations of existing datasets, benchmarks, and evaluation metrics for passive DF detection. Finally, we propose future research directions that address these unexplored and emerging issues in the field of passive DF detection. This survey offers researchers and practitioners a comprehensive resource for understanding the current landscape, methodological approaches, and promising future directions in this rapidly evolving field.

Passive Deepfake Detection Across Multi-modalities: A Comprehensive Survey

TL;DR

This survey addresses the problem of passive deepfake detection across image, video, audio, and multimodal modalities, extending beyond detection accuracy to generalization, robustness, attribution, and real-world resilience. It provides a cross-modality taxonomy of unimodal and multimodal approaches, reviews datasets and benchmarks (e.g., FF++, DFDC, DF40, DeepfakeBench, VoiceWukong), and analyzes the strengths and limitations of current methods. Key contributions include an extended evaluation framework, identification of dataset gaps, and guidance on future directions to enable robust deployment in real-world platforms. The work aims to equip researchers and practitioners with a comprehensive resource for understanding current methods, deployment challenges, and promising avenues for future DF detection research.

Abstract

In recent years, deepfakes (DFs) have been utilized for malicious purposes, such as individual impersonation, misinformation spreading, and artists style imitation, raising questions about ethical and security concerns. In this survey, we provide a comprehensive review and comparison of passive DF detection across multiple modalities, including image, video, audio, and multi-modal, to explore the inter-modality relationships between them. Beyond detection accuracy, we extend our analysis to encompass crucial performance dimensions essential for real-world deployment: generalization capabilities across novel generation techniques, robustness against adversarial manipulations and postprocessing techniques, attribution precision in identifying generation sources, and resilience under real-world operational conditions. Additionally, we analyze the advantages and limitations of existing datasets, benchmarks, and evaluation metrics for passive DF detection. Finally, we propose future research directions that address these unexplored and emerging issues in the field of passive DF detection. This survey offers researchers and practitioners a comprehensive resource for understanding the current landscape, methodological approaches, and promising future directions in this rapidly evolving field.

Paper Structure

This paper contains 31 sections, 7 figures, 8 tables.

Figures (7)

  • Figure 1: Overview of our survey structure.
  • Figure 2: Taxonomy of passive DF detection approaches.
  • Figure 3: Illustration of passive DF detection approaches in image modality.
  • Figure 4: Illustration of passive DF detection approaches in video modality.
  • Figure 5: Illustration of passive DF detection approaches in audio modality.
  • ...and 2 more figures