Training-free Detection of AI-generated images via Cropping Robustness
Sungik Choi, Hankook Lee, Moontae Lee
TL;DR
The paper addresses the need for universal, training-free detection of AI-generated images. It introduces WaRPAD, a method that uses a base HFwav score computed from high-frequency perturbations via Haar wavelets, and aggregates this signal across patches produced by a deterministic RescaleNPatchify to simulate RandomResizedCrop effects. Empirical results across Synthbuster, GenImage, and Deepfake-LSUN-Bedroom show WaRPAD achieving state-of-the-art performance among training-free detectors and robust behavior to test-time corruptions, with demonstrated applicability across multiple self-supervised backbones (e.g., DINOv2, CLIP, SwaV). The work contributes a practical, data-free approach to AI-generated image detection with implications for robust web-scale filtering and potential multimodal extensions, while acknowledging computational costs and backbone dependence as areas for future work.
Abstract
AI-generated image detection has become crucial with the rapid advancement of vision-generative models. Instead of training detectors tailored to specific datasets, we study a training-free approach leveraging self-supervised models without requiring prior data knowledge. These models, pre-trained with augmentations like RandomResizedCrop, learn to produce consistent representations across varying resolutions. Motivated by this, we propose WaRPAD, a training-free AI-generated image detection algorithm based on self-supervised models. Since neighborhood pixel differences in images are highly sensitive to resizing operations, WaRPAD first defines a base score function that quantifies the sensitivity of image embeddings to perturbations along high-frequency directions extracted via Haar wavelet decomposition. To simulate robustness against cropping augmentation, we rescale each image to a multiple of the models input size, divide it into smaller patches, and compute the base score for each patch. The final detection score is then obtained by averaging the scores across all patches. We validate WaRPAD on real datasets of diverse resolutions and domains, and images generated by 23 different generative models. Our method consistently achieves competitive performance and demonstrates strong robustness to test-time corruptions. Furthermore, as invariance to RandomResizedCrop is a common training scheme across self-supervised models, we show that WaRPAD is applicable across self-supervised models.
