Table of Contents
Fetching ...

Forensic Self-Descriptions Are All You Need for Zero-Shot Detection, Open-Set Source Attribution, and Clustering of AI-generated Images

Tai D. Nguyen, Aref Azizpour, Matthew C. Stamm

TL;DR

The paper tackles the growing challenge of distinguishing AI-generated images from real ones and attributing their sources, especially for unseen generators. It introduces forensic self-descriptions—compact embeddings derived from multi-scale residuals produced by self-supervised predictive filters learned from real images—that capture intrinsic forensic microstructures left by image formation processes. By modeling these self-descriptions with Gaussian mixtures for real data, per-source distributions, and clustering in the embedding space, the method achieves robust zero-shot detection, open-set source attribution, and unsupervised clustering without relying on synthetic training data. Experiments across diverse real and synthetic datasets show state-of-the-art performance, including strong worst-case results and resilience to JPEG compression, highlighting practical impact for forensic analysis of synthetic media. The work provides a compact, adaptable framework with broad applicability in forensic investigations and media provenance analysis.

Abstract

The emergence of advanced AI-based tools to generate realistic images poses significant challenges for forensic detection and source attribution, especially as new generative techniques appear rapidly. Traditional methods often fail to generalize to unseen generators due to reliance on features specific to known sources during training. To address this problem, we propose a novel approach that explicitly models forensic microstructures - subtle, pixel-level patterns unique to the image creation process. Using only real images in a self-supervised manner, we learn a set of diverse predictive filters to extract residuals that capture different aspects of these microstructures. By jointly modeling these residuals across multiple scales, we obtain a compact model whose parameters constitute a unique forensic self-description for each image. This self-description enables us to perform zero-shot detection of synthetic images, open-set source attribution of images, and clustering based on source without prior knowledge. Extensive experiments demonstrate that our method achieves superior accuracy and adaptability compared to competing techniques, advancing the state of the art in synthetic media forensics.

Forensic Self-Descriptions Are All You Need for Zero-Shot Detection, Open-Set Source Attribution, and Clustering of AI-generated Images

TL;DR

The paper tackles the growing challenge of distinguishing AI-generated images from real ones and attributing their sources, especially for unseen generators. It introduces forensic self-descriptions—compact embeddings derived from multi-scale residuals produced by self-supervised predictive filters learned from real images—that capture intrinsic forensic microstructures left by image formation processes. By modeling these self-descriptions with Gaussian mixtures for real data, per-source distributions, and clustering in the embedding space, the method achieves robust zero-shot detection, open-set source attribution, and unsupervised clustering without relying on synthetic training data. Experiments across diverse real and synthetic datasets show state-of-the-art performance, including strong worst-case results and resilience to JPEG compression, highlighting practical impact for forensic analysis of synthetic media. The work provides a compact, adaptable framework with broad applicability in forensic investigations and media provenance analysis.

Abstract

The emergence of advanced AI-based tools to generate realistic images poses significant challenges for forensic detection and source attribution, especially as new generative techniques appear rapidly. Traditional methods often fail to generalize to unseen generators due to reliance on features specific to known sources during training. To address this problem, we propose a novel approach that explicitly models forensic microstructures - subtle, pixel-level patterns unique to the image creation process. Using only real images in a self-supervised manner, we learn a set of diverse predictive filters to extract residuals that capture different aspects of these microstructures. By jointly modeling these residuals across multiple scales, we obtain a compact model whose parameters constitute a unique forensic self-description for each image. This self-description enables us to perform zero-shot detection of synthetic images, open-set source attribution of images, and clustering based on source without prior knowledge. Extensive experiments demonstrate that our method achieves superior accuracy and adaptability compared to competing techniques, advancing the state of the art in synthetic media forensics.

Paper Structure

This paper contains 26 sections, 12 equations, 7 figures, 14 tables.

Figures (7)

  • Figure 2: Visualization of real (top row) and synthetic (bottom row) images in the datasets used in this paper.
  • Figure 3: Our method can detect and attribute synthetic images without prior knowledge of the source. We do this by extracting residuals containing forensic microstructures from a single image and jointly modeling them across scales as a forensic self-description.
  • Figure 4: Visualization of the average power spectrum of different filters in the forensic self-descriptions obtained from various sources.
  • Figure 5: 2D t-SNE plot showing the distribution of the self-descriptions among real and synthetic sources.
  • Figure 6: Zero-shot detection performance of our method evaluated on real datasets that are not seen during training. Performance on seen dataset is also provided for comparison.
  • ...and 2 more figures