A Sanity Check for AI-generated Image Detection

Shilin Yan; Ouxiang Li; Jiayin Cai; Yanbin Hao; Xiaolong Jiang; Yao Hu; Weidi Xie

A Sanity Check for AI-generated Image Detection

Shilin Yan, Ouxiang Li, Jiayin Cai, Yanbin Hao, Xiaolong Jiang, Yao Hu, Weidi Xie

TL;DR

This paper questions whether AI-generated image detection is truly solved and introduces the Chameleon dataset to test detectors under realistic, human-perceptual challenges. It proposes AIDE, a hybrid detector that fuses low-level patch statistics via DCT/SRM with high-level semantic embeddings from OpenCLIP, forming a mixture-of-experts classifier. AIDE achieves state-of-the-art results on public benchmarks such as AIGCDetectBenchmark and GenImage, while revealing substantial gaps in generalization on the Chameleon dataset, underscoring the need for more robust evaluation. Overall, the work advocates for realistic benchmarking and hybrid-feature detectors to better anticipate real-world performance.

Abstract

With the rapid development of generative models, discerning AI-generated content has evoked increasing attention from both industry and academia. In this paper, we conduct a sanity check on "whether the task of AI-generated image detection has been solved". To start with, we present Chameleon dataset, consisting AIgenerated images that are genuinely challenging for human perception. To quantify the generalization of existing methods, we evaluate 9 off-the-shelf AI-generated image detectors on Chameleon dataset. Upon analysis, almost all models classify AI-generated images as real ones. Later, we propose AIDE (AI-generated Image DEtector with Hybrid Features), which leverages multiple experts to simultaneously extract visual artifacts and noise patterns. Specifically, to capture the high-level semantics, we utilize CLIP to compute the visual embedding. This effectively enables the model to discern AI-generated images based on semantics or contextual information; Secondly, we select the highest frequency patches and the lowest frequency patches in the image, and compute the low-level patchwise features, aiming to detect AI-generated images by low-level artifacts, for example, noise pattern, anti-aliasing, etc. While evaluating on existing benchmarks, for example, AIGCDetectBenchmark and GenImage, AIDE achieves +3.5% and +4.6% improvements to state-of-the-art methods, and on our proposed challenging Chameleon benchmarks, it also achieves the promising results, despite this problem for detecting AI-generated images is far from being solved.

A Sanity Check for AI-generated Image Detection

TL;DR

Abstract

Paper Structure (30 sections, 8 equations, 4 figures, 12 tables)

This paper contains 30 sections, 8 equations, 4 figures, 12 tables.

Introduction
Related Works
Chameleon Dataset
Problem Formulation
Chameleon Dataset
Dataset Collection
Dataset Curation
Dataset Annotation
Dataset Comparison
Methodology
Patchwise Feature Extraction
Semantic Feature Embedding
Discriminator
Experiments
Experimental Details
...and 15 more sections

Figures (4)

Figure 1: Comparison of Chameleon with existing benchmarks. We visualize two contemporary AI-generated image benchmarks, namely (a) AIGCDetect Benchmark wang2020cnn and (b) GenImage Benchmark zhu2024genimage, where all images are generated from publicly available generators, such as ProGAN (GAN-based), SD v1.4 (DM-based) and Midjourney (commercial API). These images are generated by unconditional situations or conditioned on simple prompts (e.g., photo of a plane) without delicate manual adjustments, thereby inclined to generate obvious artifacts in consistency and semantics (marked with red boxes). In contrast, our Chameleon dataset in (c) aims to simulate real-world scenarios by collecting diverse images from online websites, where these online images are carefully adjusted by photographers and AI artists.
Figure 2: Overview of AIDE. It consists of a Patchwise Feature Extraction (PFE) module and a Semantic Feature Embedding (SFE) module in a mixture of experts manner. In PFE module, the DCT Scoring module first calculates the DCT coefficients for each smashed patch and then performs a weighted sum of these coefficients (weights gradually increase as the color goes from light to dark).
Figure 3: Hyperparameter ablation of patch size and patch number introduced in our method.
Figure 4: Visualization of the effectiveness of PFE and SFE Modules with Grad-CAM selvaraju2017grad.

A Sanity Check for AI-generated Image Detection

TL;DR

Abstract

A Sanity Check for AI-generated Image Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (4)