Table of Contents
Fetching ...

Beyond Semantic Features: Pixel-level Mapping for Generalized AI-Generated Image Detection

Chenming Zhou, Jiaan Wang, Yu Li, Lei Li, Juan Cao, Sheng Tang

TL;DR

The paper tackles the problem that detectors for AI-generated images fail to generalize to unseen generators due to reliance on semantic cues. It introduces a pixel-level mapping preprocessing step that disrupts low-frequency semantic information while amplifying high-frequency generative artifacts, with fixed and random mapping variants showing robust performance. Through extensive cross-model and cross-distribution experiments on GANs and diffusion models, the approach achieves state-of-the-art generalization, supported by both quantitative metrics and qualitative analyses (t-SNE, spectra). The findings suggest a practical, computationally lightweight strategy to improve forensic robustness against evolving synthetic media, with broad implications for real-world authenticity verification.

Abstract

The rapid evolution of generative technologies necessitates reliable methods for detecting AI-generated images. A critical limitation of current detectors is their failure to generalize to images from unseen generative models, as they often overfit to source-specific semantic cues rather than learning universal generative artifacts. To overcome this, we introduce a simple yet remarkably effective pixel-level mapping pre-processing step to disrupt the pixel value distribution of images and break the fragile, non-essential semantic patterns that detectors commonly exploit as shortcuts. This forces the detector to focus on more fundamental and generalizable high-frequency traces inherent to the image generation process. Through comprehensive experiments on GAN and diffusion-based generators, we show that our approach significantly boosts the cross-generator performance of state-of-the-art detectors. Extensive analysis further verifies our hypothesis that the disruption of semantic cues is the key to generalization.

Beyond Semantic Features: Pixel-level Mapping for Generalized AI-Generated Image Detection

TL;DR

The paper tackles the problem that detectors for AI-generated images fail to generalize to unseen generators due to reliance on semantic cues. It introduces a pixel-level mapping preprocessing step that disrupts low-frequency semantic information while amplifying high-frequency generative artifacts, with fixed and random mapping variants showing robust performance. Through extensive cross-model and cross-distribution experiments on GANs and diffusion models, the approach achieves state-of-the-art generalization, supported by both quantitative metrics and qualitative analyses (t-SNE, spectra). The findings suggest a practical, computationally lightweight strategy to improve forensic robustness against evolving synthetic media, with broad implications for real-world authenticity verification.

Abstract

The rapid evolution of generative technologies necessitates reliable methods for detecting AI-generated images. A critical limitation of current detectors is their failure to generalize to images from unseen generative models, as they often overfit to source-specific semantic cues rather than learning universal generative artifacts. To overcome this, we introduce a simple yet remarkably effective pixel-level mapping pre-processing step to disrupt the pixel value distribution of images and break the fragile, non-essential semantic patterns that detectors commonly exploit as shortcuts. This forces the detector to focus on more fundamental and generalizable high-frequency traces inherent to the image generation process. Through comprehensive experiments on GAN and diffusion-based generators, we show that our approach significantly boosts the cross-generator performance of state-of-the-art detectors. Extensive analysis further verifies our hypothesis that the disruption of semantic cues is the key to generalization.

Paper Structure

This paper contains 34 sections, 3 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: The impact of different image processing methods on ImageNet classification results.
  • Figure 2: Visualization results of various semantic-reduction methods.
  • Figure 3: (a) The framework pipeline of the proposed method. The input image first passes through the pixel-level mapping module before being sent into the classification head. (b) The fixed pixel-level mapping module applies the same fixed mapping to all three channels of images. (c) The random pixel-level mapping module applies a different random mapping to the three channels of each image.
  • Figure 4: (a) Fixed pixel-level mapping table, which remains the same for each channel of each sample. (b) Four examples of random pixel-level mapping tables, which maintain randomness for each channel of each sample.
  • Figure 5: (a) Pixel-level mapping GAN model t-SNE results. (b) NPR GAN model t-SNE results. (c) Pixel-level mapping Diffusion model t-SNE results. (d) NPR Diffusion model t-SNE results.
  • ...and 6 more figures