Table of Contents
Fetching ...

Towards Sustainable Universal Deepfake Detection with Frequency-Domain Masking

Chandler Timm C. Doloriel, Habib Ullah, Kristian Hovde Liland, Fadi Al Machot, Ngai-Man Cheung

TL;DR

The paper addresses the challenge of universal deepfake detection with low computational overhead. It proposes frequency-domain masking as a supervised training augmentation, alongside spatial masking, geometric transformations, and structured pruning, to foster generalizable representations that transfer across unseen generative models. Empirical results show frequency masking outperforms other augmentations, remains robust under pruning, and yields gains on both standard benchmarks and a specialized aquaculture dataset. The work highlights frequency-domain strategies as a practical path toward sustainable, scalable deepfake detection with real-world applicability.

Abstract

Universal deepfake detection aims to identify AI-generated images across a broad range of generative models, including unseen ones. This requires robust generalization to new and unseen deepfakes, which emerge frequently, while minimizing computational overhead to enable large-scale deepfake screening, a critical objective in the era of Green AI. In this work, we explore frequency-domain masking as a training strategy for deepfake detectors. Unlike traditional methods that rely heavily on spatial features or large-scale pretrained models, our approach introduces random masking and geometric transformations, with a focus on frequency masking due to its superior generalization properties. We demonstrate that frequency masking not only enhances detection accuracy across diverse generators but also maintains performance under significant model pruning, offering a scalable and resource-conscious solution. Our method achieves state-of-the-art generalization on GAN- and diffusion-generated image datasets and exhibits consistent robustness under structured pruning. These results highlight the potential of frequency-based masking as a practical step toward sustainable and generalizable deepfake detection. Code and models are available at: [https://github.com/chandlerbing65nm/FakeImageDetection](https://github.com/chandlerbing65nm/FakeImageDetection).

Towards Sustainable Universal Deepfake Detection with Frequency-Domain Masking

TL;DR

The paper addresses the challenge of universal deepfake detection with low computational overhead. It proposes frequency-domain masking as a supervised training augmentation, alongside spatial masking, geometric transformations, and structured pruning, to foster generalizable representations that transfer across unseen generative models. Empirical results show frequency masking outperforms other augmentations, remains robust under pruning, and yields gains on both standard benchmarks and a specialized aquaculture dataset. The work highlights frequency-domain strategies as a practical path toward sustainable, scalable deepfake detection with real-world applicability.

Abstract

Universal deepfake detection aims to identify AI-generated images across a broad range of generative models, including unseen ones. This requires robust generalization to new and unseen deepfakes, which emerge frequently, while minimizing computational overhead to enable large-scale deepfake screening, a critical objective in the era of Green AI. In this work, we explore frequency-domain masking as a training strategy for deepfake detectors. Unlike traditional methods that rely heavily on spatial features or large-scale pretrained models, our approach introduces random masking and geometric transformations, with a focus on frequency masking due to its superior generalization properties. We demonstrate that frequency masking not only enhances detection accuracy across diverse generators but also maintains performance under significant model pruning, offering a scalable and resource-conscious solution. Our method achieves state-of-the-art generalization on GAN- and diffusion-generated image datasets and exhibits consistent robustness under structured pruning. These results highlight the potential of frequency-based masking as a practical step toward sustainable and generalizable deepfake detection. Code and models are available at: [https://github.com/chandlerbing65nm/FakeImageDetection](https://github.com/chandlerbing65nm/FakeImageDetection).

Paper Structure

This paper contains 22 sections, 10 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Our proposed training augmentation for universal deepfake detection using frequency masking is compared with spatial masking and geometric transformations. (i) Frequency-domain masking: The input image $I(x, y)$ is transformed to the frequency domain $F(u, v)$ via FFT. Guided by a frequency band selector, specific frequencies in $F(u, v)$ are nullified to produce $M(u, v)$. The inverse FFT yields the masked image $I'(x, y)$, which trains the detector to enhance generation. (ii) Spatial-domain masking: The input image is masked at the pixel or patch level, occluding local regions while leaving frequency artifacts intact. (iii) Geometric transformations: The input undergoes spatial perturbations (e.g., translation), altering composition without affecting frequency-domain patterns. Remark: Masking and transformations are applied only during supervised training to encourage generalizable representations. They are not used during testing.
  • Figure 2: Comparison of frequency patterns from real images (top row) and AI-generated images (bottom row) using Fourier analysis. The images were processed to remove noise and standardized before analysis. Their frequency spectra were then averaged across multiple samples to highlight consistent patterns. The bottom row reveals distinct artificial patterns not found in real images, such as repetitive grid-like structures and unusual high-frequency patterns. The color intensity represents the strength of these frequency components, with brighter areas indicating stronger artificial signatures.
  • Figure 3: The performance of different augmentation types is evaluated in terms of mean Average Precision (mAP) and Area Under the Receiver Operating Characteristic Curve (AUROC). Pixel, patch, and frequency masking are applied with a 15% masking ratio, while geometric transformations use 27$^\circ$ (15% of 180$^\circ$) random rotation and $\pm$15% translation. Among these, frequency-based masking achieves the highest mAP and AUROC, demonstrating its superior effectiveness compared to other masking approaches.
  • Figure 4: Performance of combined augmentation strategies: Rotate+Translate, Rotate+Frequency, Translate+Frequency, Rotate+Translate+Frequency, and standalone Frequency masking. The Translate+Frequency combination achieves the highest performance, suggesting complementary benefits between spatial translation and spectral masking.
  • Figure : Real