SFANet: Spatial-Frequency Attention Network for Deepfake Detection
Vrushank Ahire, Aniruddh Muley, Shivam Zample, Siddharth Verma, Pranav Menon, Surbhi Madan, Abhinav Dhall
TL;DR
Deepfake detection faces challenges in generalizing across diverse datasets and generation techniques. The authors introduce SFANet, a Spatial-Frequency Attention Network that ensembles transformer-based models with texture cues, leveraging data-splitting by human features, fake-data clustering, region-focused face cropping, and BiSeNet-based segmentation to emphasize critical regions. The approach achieves state-of-the-art performance on the DFWild-Cup benchmark, reaching an AUC of $0.9822$ and accuracy of $0.9613$, demonstrating strong cross-dataset robustness across eight datasets. This work highlights the value of hybrid architectures and targeted data handling for real-world deepfake detection and provides a scalable, deployable pipeline for practical use.
Abstract
Detecting manipulated media has now become a pressing issue with the recent rise of deepfakes. Most existing approaches fail to generalize across diverse datasets and generation techniques. We thus propose a novel ensemble framework, combining the strengths of transformer-based architectures, such as Swin Transformers and ViTs, and texture-based methods, to achieve better detection accuracy and robustness. Our method introduces innovative data-splitting, sequential training, frequency splitting, patch-based attention, and face segmentation techniques to handle dataset imbalances, enhance high-impact regions (e.g., eyes and mouth), and improve generalization. Our model achieves state-of-the-art performance when tested on the DFWild-Cup dataset, a diverse subset of eight deepfake datasets. The ensemble benefits from the complementarity of these approaches, with transformers excelling in global feature extraction and texturebased methods providing interpretability. This work demonstrates that hybrid models can effectively address the evolving challenges of deepfake detection, offering a robust solution for real-world applications.
