Table of Contents
Fetching ...

Training-free Detection of AI-generated images via Cropping Robustness

Sungik Choi, Hankook Lee, Moontae Lee

TL;DR

The paper addresses the need for universal, training-free detection of AI-generated images. It introduces WaRPAD, a method that uses a base HFwav score computed from high-frequency perturbations via Haar wavelets, and aggregates this signal across patches produced by a deterministic RescaleNPatchify to simulate RandomResizedCrop effects. Empirical results across Synthbuster, GenImage, and Deepfake-LSUN-Bedroom show WaRPAD achieving state-of-the-art performance among training-free detectors and robust behavior to test-time corruptions, with demonstrated applicability across multiple self-supervised backbones (e.g., DINOv2, CLIP, SwaV). The work contributes a practical, data-free approach to AI-generated image detection with implications for robust web-scale filtering and potential multimodal extensions, while acknowledging computational costs and backbone dependence as areas for future work.

Abstract

AI-generated image detection has become crucial with the rapid advancement of vision-generative models. Instead of training detectors tailored to specific datasets, we study a training-free approach leveraging self-supervised models without requiring prior data knowledge. These models, pre-trained with augmentations like RandomResizedCrop, learn to produce consistent representations across varying resolutions. Motivated by this, we propose WaRPAD, a training-free AI-generated image detection algorithm based on self-supervised models. Since neighborhood pixel differences in images are highly sensitive to resizing operations, WaRPAD first defines a base score function that quantifies the sensitivity of image embeddings to perturbations along high-frequency directions extracted via Haar wavelet decomposition. To simulate robustness against cropping augmentation, we rescale each image to a multiple of the models input size, divide it into smaller patches, and compute the base score for each patch. The final detection score is then obtained by averaging the scores across all patches. We validate WaRPAD on real datasets of diverse resolutions and domains, and images generated by 23 different generative models. Our method consistently achieves competitive performance and demonstrates strong robustness to test-time corruptions. Furthermore, as invariance to RandomResizedCrop is a common training scheme across self-supervised models, we show that WaRPAD is applicable across self-supervised models.

Training-free Detection of AI-generated images via Cropping Robustness

TL;DR

The paper addresses the need for universal, training-free detection of AI-generated images. It introduces WaRPAD, a method that uses a base HFwav score computed from high-frequency perturbations via Haar wavelets, and aggregates this signal across patches produced by a deterministic RescaleNPatchify to simulate RandomResizedCrop effects. Empirical results across Synthbuster, GenImage, and Deepfake-LSUN-Bedroom show WaRPAD achieving state-of-the-art performance among training-free detectors and robust behavior to test-time corruptions, with demonstrated applicability across multiple self-supervised backbones (e.g., DINOv2, CLIP, SwaV). The work contributes a practical, data-free approach to AI-generated image detection with implications for robust web-scale filtering and potential multimodal extensions, while acknowledging computational costs and backbone dependence as areas for future work.

Abstract

AI-generated image detection has become crucial with the rapid advancement of vision-generative models. Instead of training detectors tailored to specific datasets, we study a training-free approach leveraging self-supervised models without requiring prior data knowledge. These models, pre-trained with augmentations like RandomResizedCrop, learn to produce consistent representations across varying resolutions. Motivated by this, we propose WaRPAD, a training-free AI-generated image detection algorithm based on self-supervised models. Since neighborhood pixel differences in images are highly sensitive to resizing operations, WaRPAD first defines a base score function that quantifies the sensitivity of image embeddings to perturbations along high-frequency directions extracted via Haar wavelet decomposition. To simulate robustness against cropping augmentation, we rescale each image to a multiple of the models input size, divide it into smaller patches, and compute the base score for each patch. The final detection score is then obtained by averaging the scores across all patches. We validate WaRPAD on real datasets of diverse resolutions and domains, and images generated by 23 different generative models. Our method consistently achieves competitive performance and demonstrates strong robustness to test-time corruptions. Furthermore, as invariance to RandomResizedCrop is a common training scheme across self-supervised models, we show that WaRPAD is applicable across self-supervised models.

Paper Structure

This paper contains 17 sections, 3 equations, 8 figures, 9 tables, 1 algorithm.

Figures (8)

  • Figure 1: Conceptual illustration of our method $\text{WaRPAD}$. We first rescale and patchify the given image to the batch of patches. Then, we perturb the patches on the high-frequency direction of Haar wavelet decomposition. Our final score function is the averaged cosine similarity between the perturbed and non-perturbed patches' features through the self-supervised model.
  • Figure 2: Motivation for the Haar-wavelet perturbation sensitivity score.(a): The original image along with the designated region for high-frequency visualization, marked in red. To simulate the effect of RandomResizedCrop (RRC), we apply cropping regions indicated in green, blue, and yellow. (b): The high-frequency component of the original (uncropped) image obtained via Haar wavelet decomposition. (c), (d), (e): The corresponding high-frequency components of the RRC-transformed images, where the cropping regions are defined by the green, blue, and yellow boxes, respectively.
  • Figure 3: Effect of RescaleNPatchify.(a): Histogram of real and SDv1.4-generated data examined by $\text{HFwav}$. (b): Histogram of real and SDv1.4-generated data examined on patches augmented by $\texttt{RescaleNPatchify}$. (c): Histogram of real and SDv1.4-generated data examined on our $\text{WaRPAD}$ score function.
  • Figure 4: Visualization of the $\text{HFwav}$ score across patches. We show the patch with the highest score in red and lowest score in blue. Each image is from ImageNet (a), ADM-generated GenImage (b), LSUN (c), and ADM-generated Deepfake-LSUN-Bedroom dataset (d), respectively.
  • Figure 5: Hyperparameter analysis of $\text{WaRPAD}$(a): AUROC performance of $\text{WaRPAD}$ with respect to $\alpha$ in the Synthbuster benchmark. (b): AUROC result with respect to the $d_{\text{rescale}}$. (c): AUROC result with respect to the $d_{\text{patch}}$ in the Synthbuster benchmark.
  • ...and 3 more figures