FIRE: Robust Detection of Diffusion-Generated Images via Frequency-Guided Reconstruction Error
Beilin Chu, Xuan Xu, Xin Wang, Yufei Zhang, Weike You, Linna Zhou
TL;DR
FIRE introduces a frequency-guided reconstruction error detector for diffusion-generated images, exploiting the observation that real images retain mid-band information that diffusion models struggle to reconstruct. By refining mid-frequency masks with FMRE and comparing reconstruction errors before and after removing mid-band content, FIRE achieves end-to-end learning with a latent-diffusion-model encoder–decoder, enhancing generalization to unseen diffusion models. Extensive experiments on DiffusionForensics and a self-collected dataset show FIRE outperforming state-of-the-art baselines and maintaining robustness under common perturbations. The approach offers a practical, generalizable solution for detecting diffusion-generated content with improved alignment between the reconstruction process and the detection task.
Abstract
The rapid advancement of diffusion models has significantly improved high-quality image generation, making generated content increasingly challenging to distinguish from real images and raising concerns about potential misuse. In this paper, we observe that diffusion models struggle to accurately reconstruct mid-band frequency information in real images, suggesting the limitation could serve as a cue for detecting diffusion model generated images. Motivated by this observation, we propose a novel method called Frequency-guided Reconstruction Error (FIRE), which, to the best of our knowledge, is the first to investigate the influence of frequency decomposition on reconstruction error. FIRE assesses the variation in reconstruction error before and after the frequency decomposition, offering a robust method for identifying diffusion model generated images. Extensive experiments show that FIRE generalizes effectively to unseen diffusion models and maintains robustness against diverse perturbations.
