Forensic Self-Descriptions Are All You Need for Zero-Shot Detection, Open-Set Source Attribution, and Clustering of AI-generated Images
Tai D. Nguyen, Aref Azizpour, Matthew C. Stamm
TL;DR
The paper tackles the growing challenge of distinguishing AI-generated images from real ones and attributing their sources, especially for unseen generators. It introduces forensic self-descriptions—compact embeddings derived from multi-scale residuals produced by self-supervised predictive filters learned from real images—that capture intrinsic forensic microstructures left by image formation processes. By modeling these self-descriptions with Gaussian mixtures for real data, per-source distributions, and clustering in the embedding space, the method achieves robust zero-shot detection, open-set source attribution, and unsupervised clustering without relying on synthetic training data. Experiments across diverse real and synthetic datasets show state-of-the-art performance, including strong worst-case results and resilience to JPEG compression, highlighting practical impact for forensic analysis of synthetic media. The work provides a compact, adaptable framework with broad applicability in forensic investigations and media provenance analysis.
Abstract
The emergence of advanced AI-based tools to generate realistic images poses significant challenges for forensic detection and source attribution, especially as new generative techniques appear rapidly. Traditional methods often fail to generalize to unseen generators due to reliance on features specific to known sources during training. To address this problem, we propose a novel approach that explicitly models forensic microstructures - subtle, pixel-level patterns unique to the image creation process. Using only real images in a self-supervised manner, we learn a set of diverse predictive filters to extract residuals that capture different aspects of these microstructures. By jointly modeling these residuals across multiple scales, we obtain a compact model whose parameters constitute a unique forensic self-description for each image. This self-description enables us to perform zero-shot detection of synthetic images, open-set source attribution of images, and clustering based on source without prior knowledge. Extensive experiments demonstrate that our method achieves superior accuracy and adaptability compared to competing techniques, advancing the state of the art in synthetic media forensics.
