Prefilled responses enhance zero-shot detection of AI-generated images
Zoher Kachwala, Danishjeet Singh, Danielle Yang, Filippo Menczer
TL;DR
The paper tackles the problem of robustly detecting AI-generated images in a zero-shot setting as generative models rapidly evolve. It introduces Prefill-Guided Thinking (PGT), a lightweight prompting strategy that prefixes VLM responses with a task-aligned phrase to guide reasoning toward synthesis artifacts without fine-tuning. Across three diverse benchmarks and three open-source VLMs, a specific S2 prefill yields up to a 24% relative improvement in Macro F1 over baselines and CoT, demonstrating strong cross-generator generalization. The work suggests that simple, interpretable prefilling can provide scalable, generalizable detection for visual trust in AI-generated content, albeit with considerations around computational cost and prompt design.
Abstract
As AI models generate increasingly realistic images, growing concerns over potential misuse underscore the need for reliable detection. Traditional supervised detection methods depend on large, curated datasets for training and often fail to generalize to novel, out-of-domain image generators. As an alternative, we explore pre-trained Vision-Language Models (VLMs) for zero-shot detection of AI-generated images. We evaluate VLM performance on three diverse benchmarks encompassing synthetic images of human faces, objects, and animals produced by 16 different state-of-the-art image generators. While off-the-shelf VLMs perform poorly on these datasets, we find that their reasoning can be guided effectively through simple response prefilling -- a method we call Prefill-Guided Thinking (PGT). In particular, prefilling a VLM response with the task-aligned phrase "Let's examine the style and the synthesis artifacts" improves the Macro F1 scores of three widely used open-source VLMs by up to 24%.
