Generalized Design Choices for Deepfake Detectors
Lorenzo Pellegrini, Serafino Pandolfini, Davide Maltoni, Matteo Ferrara, Marco Prati, Marco Ramilli
TL;DR
This study tackles the generalization challenge of deepfake detectors posed by evolving generators. It systematically evaluates design choices across training and inference, using the AI-GenBench temporal benchmark and multiple backbones (e.g., ResNet-50 CLIP, ViT-L CLIP, DINOv2). Key findings show that realistic augmentation pipelines, especially evaluation-based augmentations with JPEG post-processing, full-image resizing, and a four-epoch schedule with $am=4$, consistently boost Next Period AUROC; direct binary optimization remains robust, though a dual-head multiclass auxiliary loss can aid larger models. For continual updates, replay-based strategies, particularly harmonic replay, provide a practical balance between adaptation and retention, enabling near-full retraining performance at reduced compute. The integrated best-of configuration achieves state-of-the-art results on AI-GenBench (e.g., 97.36% Next Period AUROC with DINOv2), offering actionable, architecture-agnostic guidelines for deploying and updating robust deepfake detectors in real-world settings.
Abstract
The effectiveness of deepfake detection methods often depends less on their core design and more on implementation details such as data preprocessing, augmentation strategies, and optimization techniques. These factors make it difficult to fairly compare detectors and to understand which factors truly contribute to their performance. To address this, we systematically investigate how different design choices influence the accuracy and generalization capabilities of deepfake detection models, focusing on aspects related to training, inference, and incremental updates. By isolating the impact of individual factors, we aim to establish robust, architecture-agnostic best practices for the design and development of future deepfake detection systems. Our experiments identify a set of design choices that consistently improve deepfake detection and enable state-of-the-art performance on the AI-GenBench benchmark.
