Do DeepFake Attribution Models Generalize?
Spiros Baxavanakis, Manos Schinas, Symeon Papadopoulos
TL;DR
The paper investigates whether DeepFake attribution models generalize across datasets and manipulation types, comparing binary detectors to multiclass attribution across five backbones and six datasets. It systematically evaluates cross-dataset generalization (RQ1), same-manipulation transfer (RQ2), and the impact of contrastive losses (RQ3), using metrics such as AUC, EER, and Balanced Accuracy. The findings indicate that binary detectors generally outperform attribution models in cross-dataset settings, while attribution gains from contrastive learning are more pronounced for larger models; data quality significantly shapes performance and indicates substantial deployment challenges in the wild. The work underscores the need for robust generalization strategies in attribution and suggests future exploration of spatiotemporal and foundation-model approaches to capture temporal cues and improve cross-domain robustness.
Abstract
Recent advancements in DeepFake generation, along with the proliferation of open-source tools, have significantly lowered the barrier for creating synthetic media. This trend poses a serious threat to the integrity and authenticity of online information, undermining public trust in institutions and media. State-of-the-art research on DeepFake detection has primarily focused on binary detection models. A key limitation of these models is that they treat all manipulation techniques as equivalent, despite the fact that different methods introduce distinct artifacts and visual cues. Only a limited number of studies explore DeepFake attribution models, although such models are crucial in practical settings. By providing the specific manipulation method employed, these models could enhance both the perceived trustworthiness and explainability for end users. In this work, we leverage five state-of-the-art backbone models and conduct extensive experiments across six DeepFake datasets. First, we compare binary and multi-class models in terms of cross-dataset generalization. Second, we examine the accuracy of attribution models in detecting seen manipulation methods in unknown datasets, hence uncovering data distribution shifts on the same DeepFake manipulations. Last, we assess the effectiveness of contrastive methods in improving cross-dataset generalization performance. Our findings indicate that while binary models demonstrate better generalization abilities, larger models, contrastive methods, and higher data quality can lead to performance improvements in attribution models. The code of this work is available on GitHub.
