DE-FAKE: Detection and Attribution of Fake Images Generated by Text-to-Image Generation Models
Zeyang Sha, Zheng Li, Ning Yu, Yang Zhang
TL;DR
This paper presents the first systematic study of detecting and attributing fake images produced by text-to-image generation models. It introduces two detector types (image-only and hybrid) and two attributors (image-only and hybrid), coupling image data with CLIP-based prompt embeddings and, when needed, BLIP-generated prompts. Through experiments on four popular T2I models across two datasets, the authors demonstrate a common artifact across fake images and model-specific fingerprints that enable effective detection and attribution, while also revealing that prompts—especially those mentioning a person and of moderate length—can affect authenticity. The work also shows robustness to unseen models via adaptation strategies and provides insights into prompt design, aiming to mitigate the societal risks posed by rapidly evolving fake imagery; code release is promised for community use.
Abstract
Text-to-image generation models that generate images based on prompt descriptions have attracted an increasing amount of attention during the past few months. Despite their encouraging performance, these models raise concerns about the misuse of their generated fake images. To tackle this problem, we pioneer a systematic study on the detection and attribution of fake images generated by text-to-image generation models. Concretely, we first build a machine learning classifier to detect the fake images generated by various text-to-image generation models. We then attribute these fake images to their source models, such that model owners can be held responsible for their models' misuse. We further investigate how prompts that generate fake images affect detection and attribution. We conduct extensive experiments on four popular text-to-image generation models, including DALL$\cdot$E 2, Stable Diffusion, GLIDE, and Latent Diffusion, and two benchmark prompt-image datasets. Empirical results show that (1) fake images generated by various models can be distinguished from real ones, as there exists a common artifact shared by fake images from different models; (2) fake images can be effectively attributed to their source models, as different models leave unique fingerprints in their generated images; (3) prompts with the ``person'' topic or a length between 25 and 75 enable models to generate fake images with higher authenticity. All findings contribute to the community's insight into the threats caused by text-to-image generation models. We appeal to the community's consideration of the counterpart solutions, like ours, against the rapidly-evolving fake image generation.
