Table of Contents
Fetching ...

Beyond Spectral Peaks: Interpreting the Cues Behind Synthetic Image Detection

Sara Mandelli, Diego Vila-Portela, David Vázquez-Padín, Paolo Bestagini, Fernando Pérez-González

TL;DR

This paper questions the assumption that frequency-domain spectral peaks are central cues used by deep learning detectors for synthetic image detection. It introduces peak-removal experiments in the Fourier domain and a simple linear, peak-based baseline to explicitly test detector reliance on these artifacts. The results show that most state-of-the-art detectors are largely not dependent on spectral peaks, while a straightforward peak-based detector achieves high accuracy, highlighting the value of interpretable methods. The findings motivate hybrid approaches that combine the transparency of linear methods with the power of deep learning for more trustworthy forensic tools.

Abstract

Over the years, the forensics community has proposed several deep learning-based detectors to mitigate the risks of generative AI. Recently, frequency-domain artifacts (particularly periodic peaks in the magnitude spectrum), have received significant attention, as they have been often considered a strong indicator of synthetic image generation. However, state-of-the-art detectors are typically used as black-boxes, and it still remains unclear whether they truly rely on these peaks. This limits their interpretability and trust. In this work, we conduct a systematic study to address this question. We propose a strategy to remove spectral peaks from images and analyze the impact of this operation on several detectors. In addition, we introduce a simple linear detector that relies exclusively on frequency peaks, providing a fully interpretable baseline free from the confounding influence of deep learning. Our findings reveal that most detectors are not fundamentally dependent on spectral peaks, challenging a widespread assumption in the field and paving the way for more transparent and reliable forensic tools.

Beyond Spectral Peaks: Interpreting the Cues Behind Synthetic Image Detection

TL;DR

This paper questions the assumption that frequency-domain spectral peaks are central cues used by deep learning detectors for synthetic image detection. It introduces peak-removal experiments in the Fourier domain and a simple linear, peak-based baseline to explicitly test detector reliance on these artifacts. The results show that most state-of-the-art detectors are largely not dependent on spectral peaks, while a straightforward peak-based detector achieves high accuracy, highlighting the value of interpretable methods. The findings motivate hybrid approaches that combine the transparency of linear methods with the power of deep learning for more trustworthy forensic tools.

Abstract

Over the years, the forensics community has proposed several deep learning-based detectors to mitigate the risks of generative AI. Recently, frequency-domain artifacts (particularly periodic peaks in the magnitude spectrum), have received significant attention, as they have been often considered a strong indicator of synthetic image generation. However, state-of-the-art detectors are typically used as black-boxes, and it still remains unclear whether they truly rely on these peaks. This limits their interpretability and trust. In this work, we conduct a systematic study to address this question. We propose a strategy to remove spectral peaks from images and analyze the impact of this operation on several detectors. In addition, we introduce a simple linear detector that relies exclusively on frequency peaks, providing a fully interpretable baseline free from the confounding influence of deep learning. Our findings reveal that most detectors are not fundamentally dependent on spectral peaks, challenging a widespread assumption in the field and paving the way for more transparent and reliable forensic tools.

Paper Structure

This paper contains 6 sections, 2 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Fourier transform analysis of synthetic images generated, respectively from left to right by 3.5 stable_diffusion_35, Flux 1.1Pro fluxpro and DALL$\cdot$E 3 dalle3.
  • Figure 2: Average Fourier spectra (magnitude, in logarithmic scale) of synthetic images generated with Midjourney (first column), DALL$\cdot$E 3 (third) and XL (fifth) before and after peak removal with periodicity $P = 8$. Best viewed in electronic format.
  • Figure 3: From left to right: average Fourier spectrum (magnitude, in logarithmic scale) of laundered versions of real images through 3.5; close-up of one quadrant; close-up of the peak-removed spectrum with periodicity $P = 16$. Best viewed in electronic format.