Table of Contents
Fetching ...

PRADA: Probability-Ratio-Based Attribution and Detection of Autoregressive-Generated Images

Simon Damm, Jonas Ricker, Henning Petzka, Asja Fischer

TL;DR

PRADA leverages the explicit likelihoods from autoregressive image generators to detect and attribute AR-generated images. By learning a small, model-specific score function that balances conditional and unconditional token probabilities, and by aggregating token-wise scores across scales, PRADA achieves state-of-the-art-like detection on powerful text-to-image models and competitive results on class-to-image models. The method offers high interpretability, requires limited training data, and adapts to new AR generators without retraining a large classifier. Its practical impact lies in providing a lightweight, reliable forensic tool for provenance and authenticity in the era of high-fidelity AR imagery.

Abstract

Autoregressive (AR) image generation has recently emerged as a powerful paradigm for image synthesis. Leveraging the generation principle of large language models, they allow for efficiently generating deceptively real-looking images, further increasing the need for reliable detection methods. However, to date there is a lack of work specifically targeting the detection of images generated by AR image generators. In this work, we present PRADA (Probability-Ratio-Based Attribution and Detection of Autoregressive-Generated Images), a simple and interpretable approach that can reliably detect AR-generated images and attribute them to their respective source model. The key idea is to inspect the ratio of a model's conditional and unconditional probability for the autoregressive token sequence representing a given image. Whenever an image is generated by a particular model, its probability ratio shows unique characteristics which are not present for images generated by other models or real images. We exploit these characteristics for threshold-based attribution and detection by calibrating a simple, model-specific score function. Our experimental evaluation shows that PRADA is highly effective against eight class-to-image and four text-to-image models.

PRADA: Probability-Ratio-Based Attribution and Detection of Autoregressive-Generated Images

TL;DR

PRADA leverages the explicit likelihoods from autoregressive image generators to detect and attribute AR-generated images. By learning a small, model-specific score function that balances conditional and unconditional token probabilities, and by aggregating token-wise scores across scales, PRADA achieves state-of-the-art-like detection on powerful text-to-image models and competitive results on class-to-image models. The method offers high interpretability, requires limited training data, and adapts to new AR generators without retraining a large classifier. Its practical impact lies in providing a lightweight, reliable forensic tool for provenance and authenticity in the era of high-fidelity AR imagery.

Abstract

Autoregressive (AR) image generation has recently emerged as a powerful paradigm for image synthesis. Leveraging the generation principle of large language models, they allow for efficiently generating deceptively real-looking images, further increasing the need for reliable detection methods. However, to date there is a lack of work specifically targeting the detection of images generated by AR image generators. In this work, we present PRADA (Probability-Ratio-Based Attribution and Detection of Autoregressive-Generated Images), a simple and interpretable approach that can reliably detect AR-generated images and attribute them to their respective source model. The key idea is to inspect the ratio of a model's conditional and unconditional probability for the autoregressive token sequence representing a given image. Whenever an image is generated by a particular model, its probability ratio shows unique characteristics which are not present for images generated by other models or real images. We exploit these characteristics for threshold-based attribution and detection by calibrating a simple, model-specific score function. Our experimental evaluation shows that PRADA is highly effective against eight class-to-image and four text-to-image models.

Paper Structure

This paper contains 46 sections, 6 equations, 17 figures, 9 tables.

Figures (17)

  • Figure 1: Overview of our proposed method. For any given image, PRADA extracts conditional and unconditional log-likelihoods for each token and assigns a score to their balanced ratio $\Delta^\alpha(x_t,c)$. A lightweight calibration step provides a small, model-specific scoring function $f_\theta:\mathbb{R}\to\mathbb{R}$. For next-scale prediction models, the scale-wise average of token scores is linearly combined with weights $w_i$ to obtain the final PRADA score. The calibrated score $P(x)$ is highly effective for detection and model attribution, especially for powerful text-to-image models.
  • Figure 2: Distributions of features derived from log probabilities of AR image generators for real and generated images. While neither conditional probabilities (1st col), nor probability ratios (2nd col) or ICAS (3rd col) consistently tell real and generated images apart, our PRADA score separates their distributions.
  • Figure 3: Visualization of the scale dependence. While higher scales are useful for VAR-d30, they are harmful for Infinity-2B, especially for ICAS.
  • Figure 4: Attribution performance of PRADA. We report the confusion matrices (normalized over rows and averaged over five calibration runs) for class-to-image and text-to-image models. PRADA achieves high performance across various AR image generators and is particularly effective against text-to-image models.
  • Figure 5: Robustness analysis for PRADA. We report the AUROC for images generated by class-to-image (top) and text-to-image models (bottom) under varying degrees of perturbation.
  • ...and 12 more figures