Multiple Contexts and Frequencies Aggregation Network forDeepfake Detection
Zifeng Li, Wenzhong Tang, Shijun Gao, Shuai Wang, Yanxiang Wang
TL;DR
This work targets the generalization gap in deepfake detection by integrating spatial and frequency priors directly into the backbone. It introduces MkfaNet, a four-stage network built from Multi-Kernel Aggregator (MKA) and Multi-Frequency Aggregator (MFA) blocks that jointly capture multi-scale spatial cues and frequency-domain artifacts. Empirical results on seven benchmarks show MkfaNet achieves superior within-domain and cross-domain performance while using parameter-efficient backbones. The approach enhances robustness to high-quality forgeries and degradation, offering a practical backbone solution for real-world deepfake detection deployments.
Abstract
Deepfake detection faces increasing challenges since the fast growth of generative models in developing massive and diverse Deepfake technologies. Recent advances rely on introducing heuristic features from spatial or frequency domains rather than modeling general forgery features within backbones. To address this issue, we turn to the backbone design with two intuitive priors from spatial and frequency detectors, \textit{i.e.,} learning robust spatial attributes and frequency distributions that are discriminative for real and fake samples. To this end, we propose an efficient network for face forgery detection named MkfaNet, which consists of two core modules. For spatial contexts, we design a Multi-Kernel Aggregator that adaptively selects organ features extracted by multiple convolutions for modeling subtle facial differences between real and fake faces. For the frequency components, we propose a Multi-Frequency Aggregator to process different bands of frequency components by adaptively reweighing high-frequency and low-frequency features. Comprehensive experiments on seven popular deepfake detection benchmarks demonstrate that our proposed MkfaNet variants achieve superior performances in both within-domain and across-domain evaluations with impressive efficiency of parameter usage.
