Table of Contents
Fetching ...

Prefilled responses enhance zero-shot detection of AI-generated images

Zoher Kachwala, Danishjeet Singh, Danielle Yang, Filippo Menczer

TL;DR

The paper tackles the problem of robustly detecting AI-generated images in a zero-shot setting as generative models rapidly evolve. It introduces Prefill-Guided Thinking (PGT), a lightweight prompting strategy that prefixes VLM responses with a task-aligned phrase to guide reasoning toward synthesis artifacts without fine-tuning. Across three diverse benchmarks and three open-source VLMs, a specific S2 prefill yields up to a 24% relative improvement in Macro F1 over baselines and CoT, demonstrating strong cross-generator generalization. The work suggests that simple, interpretable prefilling can provide scalable, generalizable detection for visual trust in AI-generated content, albeit with considerations around computational cost and prompt design.

Abstract

As AI models generate increasingly realistic images, growing concerns over potential misuse underscore the need for reliable detection. Traditional supervised detection methods depend on large, curated datasets for training and often fail to generalize to novel, out-of-domain image generators. As an alternative, we explore pre-trained Vision-Language Models (VLMs) for zero-shot detection of AI-generated images. We evaluate VLM performance on three diverse benchmarks encompassing synthetic images of human faces, objects, and animals produced by 16 different state-of-the-art image generators. While off-the-shelf VLMs perform poorly on these datasets, we find that their reasoning can be guided effectively through simple response prefilling -- a method we call Prefill-Guided Thinking (PGT). In particular, prefilling a VLM response with the task-aligned phrase "Let's examine the style and the synthesis artifacts" improves the Macro F1 scores of three widely used open-source VLMs by up to 24%.

Prefilled responses enhance zero-shot detection of AI-generated images

TL;DR

The paper tackles the problem of robustly detecting AI-generated images in a zero-shot setting as generative models rapidly evolve. It introduces Prefill-Guided Thinking (PGT), a lightweight prompting strategy that prefixes VLM responses with a task-aligned phrase to guide reasoning toward synthesis artifacts without fine-tuning. Across three diverse benchmarks and three open-source VLMs, a specific S2 prefill yields up to a 24% relative improvement in Macro F1 over baselines and CoT, demonstrating strong cross-generator generalization. The work suggests that simple, interpretable prefilling can provide scalable, generalizable detection for visual trust in AI-generated content, albeit with considerations around computational cost and prompt design.

Abstract

As AI models generate increasingly realistic images, growing concerns over potential misuse underscore the need for reliable detection. Traditional supervised detection methods depend on large, curated datasets for training and often fail to generalize to novel, out-of-domain image generators. As an alternative, we explore pre-trained Vision-Language Models (VLMs) for zero-shot detection of AI-generated images. We evaluate VLM performance on three diverse benchmarks encompassing synthetic images of human faces, objects, and animals produced by 16 different state-of-the-art image generators. While off-the-shelf VLMs perform poorly on these datasets, we find that their reasoning can be guided effectively through simple response prefilling -- a method we call Prefill-Guided Thinking (PGT). In particular, prefilling a VLM response with the task-aligned phrase "Let's examine the style and the synthesis artifacts" improves the Macro F1 scores of three widely used open-source VLMs by up to 24%.

Paper Structure

This paper contains 15 sections, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Top: Sample images from D3 (top row), DF40 (middle row), and GenImage (bottom row) datasets. Can you guess which ones are real? The answer is in the footnote on the next page. Bottom: Guiding model thinking with prefilled responses: chain-of-thought (left) vs task-aligned (right).
  • Figure 2: Illustration of PGT for the detection of an AI-generated image using a VLM (Qwen2.5-7B). Input text is marked in grey , response text in blue . (a) A baseline user query results in the incorrect response real . (b) Using the chain-of-thought prefill Let's think step by step improves reasoning, but the classification remains incorrect. (c) Using our proposed S2 prefill Let's examine the style and the synthesis artifacts leads to the correct classification: ai‑generated . Full reasoning traces for all three methods in the Appendix (Figs. \ref{['fig:elephantzeroshot']}, \ref{['fig:elephantzeroshotcot']}, \ref{['fig:elephantzeroshotss']}).
  • Figure 3: Detection performance (Macro F1) across models, datasets, and PGT variations. Bars are annotated with relative improvements of S2 over the next best method and 95% confidence error bars from 10k bootstrap iterations.
  • Figure 4: Detection performance (Recall %) for Llama across different datasets and their state-of-the-art synthetic image generators. Similar figures for LLaVA and Qwen in the Appendix (Figs. \ref{['fig:recall_radar_qwen']}, \ref{['fig:recall_radar_llava']}).
  • Figure 5: Top 20 words associated with the highest improvement in detection correctness for Qwen. Similar plots for LLaVA and Llama in the Appendix (Figs. \ref{['fig:vocab_lr_llava']}, \ref{['fig:vocab_lr_llama']}).
  • ...and 7 more figures