DeepfakeBench: A Comprehensive Benchmark of Deepfake Detection
Zhiyuan Yan, Yong Zhang, Xinhang Yuan, Siwei Lyu, Baoyuan Wu
TL;DR
DeepfakeBench addresses the lack of a standardized benchmark for deepfake detection by providing a modular, extensible platform that unifies data processing, detectors, and evaluation protocols across 9 datasets. It implements 15 detectors spanning naive, spatial, and frequency categories and includes 9 datasets with standardized frame-based evaluation. The paper presents extensive experiments analyzing data augmentation, backbone architectures (notably Xception and EfficientNet-B4), and the impact of pretraining, revealing notable generalization gaps across cross-domain and cross-manipulation settings. The benchmark and analyses enable fairer comparisons and offer insights to guide future detector design, with code and results publicly available on GitHub.
Abstract
A critical yet frequently overlooked challenge in the field of deepfake detection is the lack of a standardized, unified, comprehensive benchmark. This issue leads to unfair performance comparisons and potentially misleading results. Specifically, there is a lack of uniformity in data processing pipelines, resulting in inconsistent data inputs for detection models. Additionally, there are noticeable differences in experimental settings, and evaluation strategies and metrics lack standardization. To fill this gap, we present the first comprehensive benchmark for deepfake detection, called DeepfakeBench, which offers three key contributions: 1) a unified data management system to ensure consistent input across all detectors, 2) an integrated framework for state-of-the-art methods implementation, and 3) standardized evaluation metrics and protocols to promote transparency and reproducibility. Featuring an extensible, modular-based codebase, DeepfakeBench contains 15 state-of-the-art detection methods, 9 deepfake datasets, a series of deepfake detection evaluation protocols and analysis tools, as well as comprehensive evaluations. Moreover, we provide new insights based on extensive analysis of these evaluations from various perspectives (e.g., data augmentations, backbones). We hope that our efforts could facilitate future research and foster innovation in this increasingly critical domain. All codes, evaluations, and analyses of our benchmark are publicly available at https://github.com/SCLBD/DeepfakeBench.
