Table of Contents
Fetching ...

Do Deepfake Detectors Work in Reality?

Simiao Ren, Hengwei Xu, Tsang Ng, Kidus Zewde, Shengkai Jiang, Ramini Desai, Disha Patil, Ning-Yau Cheng, Yining Zhou, Ragavi Muthukrishnan

TL;DR

Do Deepfake Detectors Work in Reality? investigates why detectors fail on real-world deepfakes by introducing the Real-World Faceswap Dataset (RWFS) and a self-swap technique to reveal post-processing effects. The study shows detectors' performance collapses on real-world faceswaps, particularly after super-resolution, with AUROC dropping from over 0.9 on benchmarks to around 0.7, illustrating a distribution shift in detectable artifacts. By publishing RWFS and quantifying the impact of super-resolution, the work provides a practical roadmap to improve robustness and applicability of deepfake detectors in real-world contexts. These insights underscore the need for robust evaluation against real-world post-processing to mitigate societal risks associated with deepfake misuse.

Abstract

Deepfakes, particularly those involving faceswap-based manipulations, have sparked significant societal concern due to their increasing realism and potential for misuse. Despite rapid advancements in generative models, detection methods have not kept pace, creating a critical gap in defense strategies. This disparity is further amplified by the disconnect between academic research and real-world applications, which often prioritize different objectives and evaluation criteria. In this study, we take a pivotal step toward bridging this gap by presenting a novel observation: the post-processing step of super-resolution, commonly employed in real-world scenarios, substantially undermines the effectiveness of existing deepfake detection methods. To substantiate this claim, we introduce and publish the first real-world faceswap dataset, collected from popular online faceswap platforms. We then qualitatively evaluate the performance of state-of-the-art deepfake detectors on real-world deepfakes, revealing that their accuracy approaches the level of random guessing. Furthermore, we quantitatively demonstrate the significant performance degradation caused by common post-processing techniques. By addressing this overlooked challenge, our study underscores a critical avenue for enhancing the robustness and practical applicability of deepfake detection methods in real-world settings.

Do Deepfake Detectors Work in Reality?

TL;DR

Do Deepfake Detectors Work in Reality? investigates why detectors fail on real-world deepfakes by introducing the Real-World Faceswap Dataset (RWFS) and a self-swap technique to reveal post-processing effects. The study shows detectors' performance collapses on real-world faceswaps, particularly after super-resolution, with AUROC dropping from over 0.9 on benchmarks to around 0.7, illustrating a distribution shift in detectable artifacts. By publishing RWFS and quantifying the impact of super-resolution, the work provides a practical roadmap to improve robustness and applicability of deepfake detectors in real-world contexts. These insights underscore the need for robust evaluation against real-world post-processing to mitigate societal risks associated with deepfake misuse.

Abstract

Deepfakes, particularly those involving faceswap-based manipulations, have sparked significant societal concern due to their increasing realism and potential for misuse. Despite rapid advancements in generative models, detection methods have not kept pace, creating a critical gap in defense strategies. This disparity is further amplified by the disconnect between academic research and real-world applications, which often prioritize different objectives and evaluation criteria. In this study, we take a pivotal step toward bridging this gap by presenting a novel observation: the post-processing step of super-resolution, commonly employed in real-world scenarios, substantially undermines the effectiveness of existing deepfake detection methods. To substantiate this claim, we introduce and publish the first real-world faceswap dataset, collected from popular online faceswap platforms. We then qualitatively evaluate the performance of state-of-the-art deepfake detectors on real-world deepfakes, revealing that their accuracy approaches the level of random guessing. Furthermore, we quantitatively demonstrate the significant performance degradation caused by common post-processing techniques. By addressing this overlooked challenge, our study underscores a critical avenue for enhancing the robustness and practical applicability of deepfake detection methods in real-world settings.

Paper Structure

This paper contains 11 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: Real-world faceswap dataset generation process with race-age-gender matching
  • Figure 2: Efficient-b4 naive detector pretrained on FF, weights taken from yan2023deepfakebench.
  • Figure 3: Self-blended imagery detector pretrained on FF, weights taken from chen2022self
  • Figure 4: Visualizing the self-swap result by plotting the difference.
  • Figure 5: Self-blended images model performance degradation caused by super-resolution on FF++
  • ...and 1 more figures