Detecting Origin Attribution for Text-to-Image Diffusion Models
Katherine Xu, Lingzhi Zhang, Jianbo Shi
TL;DR
This work studies origin attribution for modern text-to-image diffusion models by training image attributors to identify the source generator among 12 contemporary T2I models (plus real images) and by probing how inference-time hyperparameters, post-editing, and visual granularity affect attribution. A dataset of nearly 450K images from diverse generators and prompts is built, and RGB-based attributors achieve over $90\%$ accuracy on a $13$-way task, indicating strong generator fingerprints. The study reveals that initialization seeds are almost perfectly detectable (near $99\%$), with other hyperparameters also leaving detectable traces, while post-editing degrades but does not erase attribution capability. Beyond RGB, style-based (Gram matrix) and mid-level representations yield robust signals (e.g., $92.80\%$ accuracy for style features), suggesting that texture, structure, and layout encode generator fingerprints. These findings advance fake-image forensics and copyright protection, and the authors provide a framework to extend attribution to new open-source and proprietary generators.
Abstract
Modern text-to-image (T2I) diffusion models can generate images with remarkable realism and creativity. These advancements have sparked research in fake image detection and attribution, yet prior studies have not fully explored the practical and scientific dimensions of this task. In addition to attributing images to 12 state-of-the-art T2I generators, we provide extensive analyses on what inference stage hyperparameters and image modifications are discernible. Our experiments reveal that initialization seeds are highly detectable, along with other subtle variations in the image generation process to some extent. We further investigate what visual traces are leveraged in image attribution by perturbing high-frequency details and employing mid-level representations of image style and structure. Notably, altering high-frequency information causes only slight reductions in accuracy, and training an attributor on style representations outperforms training on RGB images. Our analyses underscore that fake images are detectable and attributable at various levels of visual granularity.
