Table of Contents
Fetching ...

DE-FAKE: Detection and Attribution of Fake Images Generated by Text-to-Image Generation Models

Zeyang Sha, Zheng Li, Ning Yu, Yang Zhang

TL;DR

This paper presents the first systematic study of detecting and attributing fake images produced by text-to-image generation models. It introduces two detector types (image-only and hybrid) and two attributors (image-only and hybrid), coupling image data with CLIP-based prompt embeddings and, when needed, BLIP-generated prompts. Through experiments on four popular T2I models across two datasets, the authors demonstrate a common artifact across fake images and model-specific fingerprints that enable effective detection and attribution, while also revealing that prompts—especially those mentioning a person and of moderate length—can affect authenticity. The work also shows robustness to unseen models via adaptation strategies and provides insights into prompt design, aiming to mitigate the societal risks posed by rapidly evolving fake imagery; code release is promised for community use.

Abstract

Text-to-image generation models that generate images based on prompt descriptions have attracted an increasing amount of attention during the past few months. Despite their encouraging performance, these models raise concerns about the misuse of their generated fake images. To tackle this problem, we pioneer a systematic study on the detection and attribution of fake images generated by text-to-image generation models. Concretely, we first build a machine learning classifier to detect the fake images generated by various text-to-image generation models. We then attribute these fake images to their source models, such that model owners can be held responsible for their models' misuse. We further investigate how prompts that generate fake images affect detection and attribution. We conduct extensive experiments on four popular text-to-image generation models, including DALL$\cdot$E 2, Stable Diffusion, GLIDE, and Latent Diffusion, and two benchmark prompt-image datasets. Empirical results show that (1) fake images generated by various models can be distinguished from real ones, as there exists a common artifact shared by fake images from different models; (2) fake images can be effectively attributed to their source models, as different models leave unique fingerprints in their generated images; (3) prompts with the ``person'' topic or a length between 25 and 75 enable models to generate fake images with higher authenticity. All findings contribute to the community's insight into the threats caused by text-to-image generation models. We appeal to the community's consideration of the counterpart solutions, like ours, against the rapidly-evolving fake image generation.

DE-FAKE: Detection and Attribution of Fake Images Generated by Text-to-Image Generation Models

TL;DR

This paper presents the first systematic study of detecting and attributing fake images produced by text-to-image generation models. It introduces two detector types (image-only and hybrid) and two attributors (image-only and hybrid), coupling image data with CLIP-based prompt embeddings and, when needed, BLIP-generated prompts. Through experiments on four popular T2I models across two datasets, the authors demonstrate a common artifact across fake images and model-specific fingerprints that enable effective detection and attribution, while also revealing that prompts—especially those mentioning a person and of moderate length—can affect authenticity. The work also shows robustness to unseen models via adaptation strategies and provides insights into prompt design, aiming to mitigate the societal risks posed by rapidly evolving fake imagery; code release is promised for community use.

Abstract

Text-to-image generation models that generate images based on prompt descriptions have attracted an increasing amount of attention during the past few months. Despite their encouraging performance, these models raise concerns about the misuse of their generated fake images. To tackle this problem, we pioneer a systematic study on the detection and attribution of fake images generated by text-to-image generation models. Concretely, we first build a machine learning classifier to detect the fake images generated by various text-to-image generation models. We then attribute these fake images to their source models, such that model owners can be held responsible for their models' misuse. We further investigate how prompts that generate fake images affect detection and attribution. We conduct extensive experiments on four popular text-to-image generation models, including DALLE 2, Stable Diffusion, GLIDE, and Latent Diffusion, and two benchmark prompt-image datasets. Empirical results show that (1) fake images generated by various models can be distinguished from real ones, as there exists a common artifact shared by fake images from different models; (2) fake images can be effectively attributed to their source models, as different models leave unique fingerprints in their generated images; (3) prompts with the ``person'' topic or a length between 25 and 75 enable models to generate fake images with higher authenticity. All findings contribute to the community's insight into the threats caused by text-to-image generation models. We appeal to the community's consideration of the counterpart solutions, like ours, against the rapidly-evolving fake image generation.
Paper Structure (27 sections, 13 figures, 3 tables)

This paper contains 27 sections, 13 figures, 3 tables.

Figures (13)

  • Figure 1: An illustration of our work, including fake image detection, fake image attribution, and prompt analysis.
  • Figure 2: An illustration of fake image detection. The red part describes image-only detection. The green part describes hybrid detection. The blue part describes fake images generated by other text-to-image generation models.
  • Figure 3: The performance of the forensic classifier and detectors. We conduct the evaluation on (a) MSCOCO and (b) Flickr30k, respectively.
  • Figure 4: The visualization of frequency analysis on (a) real images and (b) fake images.
  • Figure 5: The probability distribution of the connection between the real/fake images and the corresponding prompts.
  • ...and 8 more figures