Real-Time Deepfake Detection in the Real-World
Bar Cavia, Eliahu Horwitz, Tal Reiss, Yedid Hoshen
TL;DR
This work presents LaDeDa, a patch-based deepfake detector that scores individual $q\times q$ patches (with a $9\times9$ receptive field) and pools them to obtain an image-level prediction, achieving near-SOTA performance on standard benchmarks. To enable practical deployment, the authors distill LaDeDa into Tiny-LaDeDa, a four-layer, edge-friendly model that preserves most accuracy while dramatically reducing FLOPs and parameter count. They also argue that prevailing simulated evaluation protocols do not reflect real-world performance and introduce WildRF, a real-world social-media–sourced deepfake dataset showing a substantial gap to perfect accuracy and highlighting generalization challenges. Across both simulated and real-world benchmarks, LaDeDa/Tiny-LaDeDa demonstrate strong local-artifact detection, with WildRF revealing persistent generalization gaps and JPEG robustness analyses underscoring the need for realistic benchmarks. The work advocates using WildRF for future evaluation and demonstrates that compact, efficient models can achieve practical real-time performance, while also acknowledging ongoing challenges in reliable real-world deepfake detection.
Abstract
Recent improvements in generative AI made synthesizing fake images easy; as they can be used to cause harm, it is crucial to develop accurate techniques to identify them. This paper introduces "Locally Aware Deepfake Detection Algorithm" (LaDeDa), that accepts a single 9x9 image patch and outputs its deepfake score. The image deepfake score is the pooled score of its patches. With merely patch-level information, LaDeDa significantly improves over the state-of-the-art, achieving around 99% mAP on current benchmarks. Owing to the patch-level structure of LaDeDa, we hypothesize that the generation artifacts can be detected by a simple model. We therefore distill LaDeDa into Tiny-LaDeDa, a highly efficient model consisting of only 4 convolutional layers. Remarkably, Tiny-LaDeDa has 375x fewer FLOPs and is 10,000x more parameter-efficient than LaDeDa, allowing it to run efficiently on edge devices with a minor decrease in accuracy. These almost-perfect scores raise the question: is the task of deepfake detection close to being solved? Perhaps surprisingly, our investigation reveals that current training protocols prevent methods from generalizing to real-world deepfakes extracted from social media. To address this issue, we introduce WildRF, a new deepfake detection dataset curated from several popular social networks. Our method achieves the top performance of 93.7% mAP on WildRF, however the large gap from perfect accuracy shows that reliable real-world deepfake detection is still unsolved.
