How Noise Benefits AI-generated Image Detection
Jiazhen Yan, Ziqiang Li, Fan Wang, Kai Zeng, Zhangjie Fu
TL;DR
This work tackles the open-world generalization problem in AI-generated image detection, where detectors overfit to shortcuts in training data. It introduces PiN-CLIP, a variational, noise-driven framework that jointly learns a noise generator and a CLIP-based detector, using cross-attention to produce artifact-conditioned perturbations that maximize $I(\mathcal{T};\mathcal{E})$ and reduce $H(\mathcal{T}|\mathcal{E})$. By injecting mean-guided and curvature-aware noise into the feature space, PiN-CLIP suppresses shortcut-prone directions while amplifying stable forensic cues, achieving state-of-the-art cross-domain performance on GenImage and AIGIBench, with notable improvements in average accuracy. The approach demonstrates strong robustness to common perturbations (JPEG, blur) and provides a principled, noise-driven paradigm for improving multimedia forensics in open-world settings.
Abstract
The rapid advancement of generative models has made real and synthetic images increasingly indistinguishable. Although extensive efforts have been devoted to detecting AI-generated images, out-of-distribution generalization remains a persistent challenge. We trace this weakness to spurious shortcuts exploited during training and we also observe that small feature-space perturbations can mitigate shortcut dominance. To address this problem in a more controllable manner, we propose the Positive-Incentive Noise for CLIP (PiN-CLIP), which jointly trains a noise generator and a detection network under a variational positive-incentive principle. Specifically, we construct positive-incentive noise in the feature space via cross-attention fusion of visual and categorical semantic features. During optimization, the noise is injected into the feature space to fine-tune the visual encoder, suppressing shortcut-sensitive directions while amplifying stable forensic cues, thereby enabling the extraction of more robust and generalized artifact representations. Comparative experiments are conducted on an open-world dataset comprising synthetic images generated by 42 distinct generative models. Our method achieves new state-of-the-art performance, with notable improvements of 5.4 in average accuracy over existing approaches.
