RIGID: A Training-free and Model-Agnostic Framework for Robust AI-Generated Image Detection
Zhiyuan He, Pin-Yu Chen, Tsung-Yi Ho
TL;DR
This work tackles AI-generated image detection without training detectors. It introduces RIGID, a training-free and model-agnostic framework that probes the sensitivity of real versus generated images by perturbing inputs and measuring cosine similarity in a pretrained feature space, specifically $sim = \cos(f(x), f(x+\lambda\delta))$. The authors provide a theoretical justification via Stein's lemma, showing generated images exhibit larger gradient norms in the smoothed similarity metric. Extensive experiments across ImageNet, LSUN-Bedroom, and GenImage demonstrate that RIGID outperforms both training-based and training-free baselines, generalizes across generation methods, and remains robust to common image corruptions, making it a cost-efficient and practical solution for robust AI-generated image detection.
Abstract
The rapid advances in generative AI models have empowered the creation of highly realistic images with arbitrary content, raising concerns about potential misuse and harm, such as Deepfakes. Current research focuses on training detectors using large datasets of generated images. However, these training-based solutions are often computationally expensive and show limited generalization to unseen generated images. In this paper, we propose a training-free method to distinguish between real and AI-generated images. We first observe that real images are more robust to tiny noise perturbations than AI-generated images in the representation space of vision foundation models. Based on this observation, we propose RIGID, a training-free and model-agnostic method for robust AI-generated image detection. RIGID is a simple yet effective approach that identifies whether an image is AI-generated by comparing the representation similarity between the original and the noise-perturbed counterpart. Our evaluation on a diverse set of AI-generated images and benchmarks shows that RIGID significantly outperforms existing trainingbased and training-free detectors. In particular, the average performance of RIGID exceeds the current best training-free method by more than 25%. Importantly, RIGID exhibits strong generalization across different image generation methods and robustness to image corruptions.
