Table of Contents
Fetching ...

A Single Simple Patch is All You Need for AI-generated Image Detection

Jiaxuan Chen, Jieteng Yao, Li Niu

TL;DR

This work addresses the poor generalization of AI-generated image detectors to unseen generators by exploiting the noise fingerprints of a single simple patch. The authors propose the Single Simple Patch (SSP) network, which selects the patch with minimal texture diversity, extracts its noise via SRM high-pass filters, and classifies with a ResNet-50. To cope with degraded image quality, they add an enhancement module and a perception module that guide deblurring and decompression, respectively, using learned task embeddings. Experiments on GenImage and ForenSynths show state-of-the-art cross-generator performance and practical robustness, with notable improvements over PatchCraft and pretrained-model baselines while maintaining efficiency. The approach highlights the value of camera-origin noise as a robust cue for forgery detection and offers a lightweight, scalable solution for real-world deployment.

Abstract

The recent development of generative models unleashes the potential of generating hyper-realistic fake images. To prevent the malicious usage of fake images, AI-generated image detection aims to distinguish fake images from real images. However, existing method suffer from severe performance drop when detecting images generated by unseen generators. We find that generative models tend to focus on generating the patches with rich textures to make the images more realistic while neglecting the hidden noise caused by camera capture present in simple patches. In this paper, we propose to exploit the noise pattern of a single simple patch to identify fake images. Furthermore, due to the performance decline when handling low-quality generated images, we introduce an enhancement module and a perception module to remove the interfering information. Extensive experiments demonstrate that our method can achieve state-of-the-art performance on public benchmarks.

A Single Simple Patch is All You Need for AI-generated Image Detection

TL;DR

This work addresses the poor generalization of AI-generated image detectors to unseen generators by exploiting the noise fingerprints of a single simple patch. The authors propose the Single Simple Patch (SSP) network, which selects the patch with minimal texture diversity, extracts its noise via SRM high-pass filters, and classifies with a ResNet-50. To cope with degraded image quality, they add an enhancement module and a perception module that guide deblurring and decompression, respectively, using learned task embeddings. Experiments on GenImage and ForenSynths show state-of-the-art cross-generator performance and practical robustness, with notable improvements over PatchCraft and pretrained-model baselines while maintaining efficiency. The approach highlights the value of camera-origin noise as a robust cue for forgery detection and offers a lightweight, scalable solution for real-world deployment.

Abstract

The recent development of generative models unleashes the potential of generating hyper-realistic fake images. To prevent the malicious usage of fake images, AI-generated image detection aims to distinguish fake images from real images. However, existing method suffer from severe performance drop when detecting images generated by unseen generators. We find that generative models tend to focus on generating the patches with rich textures to make the images more realistic while neglecting the hidden noise caused by camera capture present in simple patches. In this paper, we propose to exploit the noise pattern of a single simple patch to identify fake images. Furthermore, due to the performance decline when handling low-quality generated images, we introduce an enhancement module and a perception module to remove the interfering information. Extensive experiments demonstrate that our method can achieve state-of-the-art performance on public benchmarks.
Paper Structure (20 sections, 6 equations, 3 figures, 7 tables)

This paper contains 20 sections, 6 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: The overall architecture of our proposed method, which consists of three parts: enhancement module, perception module, and single simple patch (SSP) network. We first extract the simplest patch from the original image. Then, we use the enhancement module and perception module to get high-quality patch, which is sent to the SSP network.
  • Figure 2: In each row, from left to right, we show the image, the simplest patch, and SRM fridrich2012rich outputs using 3 high-pass filters. The top three rows are fake images generated by Midjourney Midjourney, while the bottom three rows are real images from ImageNet deng2009imagenet.
  • Figure 3: Left: The robustness of different methods to blur and compression. Right: Example blurry or compressed images.