Table of Contents
Fetching ...

How Generalizable are Deepfake Image Detectors? An Empirical Study

Boquan Li, Jun Sun, Christopher M. Poskitt, Xingmei Wang

TL;DR

This work presents the first empirical study on the generalizability of deepfake detectors, an essential goal for detectors to stay one step ahead of attackers, and finds that there are neurons universally contributing to detection across seen and unseen datasets, suggesting a possible path towards zero-shot generalizability.

Abstract

Deepfakes are becoming increasingly credible, posing a significant threat given their potential to facilitate fraud or bypass access control systems. This has motivated the development of deepfake detection methods, in which deep learning models are trained to distinguish between real and synthesized footage. Unfortunately, existing detectors struggle to generalize to deepfakes from datasets they were not trained on, but little work has been done to examine why or how this limitation can be addressed. Especially, those single-modality deepfake images reveal little available forgery evidence, posing greater challenges than detecting deepfake videos. In this work, we present the first empirical study on the generalizability of deepfake detectors, an essential goal for detectors to stay one step ahead of attackers. Our study utilizes six deepfake datasets, five deepfake image detection methods, and two model augmentation approaches, confirming that detectors do not generalize in zero-shot settings. Additionally, we find that detectors are learning unwanted properties specific to synthesis methods and struggling to extract discriminative features, limiting their ability to generalize. Finally, we find that there are neurons universally contributing to detection across seen and unseen datasets, suggesting a possible path towards zero-shot generalizability.

How Generalizable are Deepfake Image Detectors? An Empirical Study

TL;DR

This work presents the first empirical study on the generalizability of deepfake detectors, an essential goal for detectors to stay one step ahead of attackers, and finds that there are neurons universally contributing to detection across seen and unseen datasets, suggesting a possible path towards zero-shot generalizability.

Abstract

Deepfakes are becoming increasingly credible, posing a significant threat given their potential to facilitate fraud or bypass access control systems. This has motivated the development of deepfake detection methods, in which deep learning models are trained to distinguish between real and synthesized footage. Unfortunately, existing detectors struggle to generalize to deepfakes from datasets they were not trained on, but little work has been done to examine why or how this limitation can be addressed. Especially, those single-modality deepfake images reveal little available forgery evidence, posing greater challenges than detecting deepfake videos. In this work, we present the first empirical study on the generalizability of deepfake detectors, an essential goal for detectors to stay one step ahead of attackers. Our study utilizes six deepfake datasets, five deepfake image detection methods, and two model augmentation approaches, confirming that detectors do not generalize in zero-shot settings. Additionally, we find that detectors are learning unwanted properties specific to synthesis methods and struggling to extract discriminative features, limiting their ability to generalize. Finally, we find that there are neurons universally contributing to detection across seen and unseen datasets, suggesting a possible path towards zero-shot generalizability.
Paper Structure (18 sections, 1 equation, 5 figures, 7 tables)

This paper contains 18 sections, 1 equation, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Comparison of deepfake generation methods.
  • Figure 2: Heatmaps visualizing the regions contributing to detectors' classification decisions across datasets: the intense and cool colors (such as red and blue) indicate high and low contributions respectively.
  • Figure 3: Similarity (SSIM) of heatmaps generated by groups of detectors on unseen datasets.
  • Figure 4: Interpretation of a detection network as an SCM.
  • Figure 5: Causality analysis: the number of (overlapping) neurons that have contributed to classification across the five testing datasets (CELEBV2, FS, NT, DF, and DFD).

Theorems & Definitions (3)

  • Definition 1: Structural Causal Model (SCM)
  • Definition 2: Detection Model as SCM
  • Definition 3: Average Causal Effect (ACE)