Table of Contents
Fetching ...

DF-LLaVA: Unlocking MLLM's potential for Synthetic Image Detection via Prompt-Guided Knowledge Injection

Zhuokang Shen, Kaisen Zhang, Bohan Jia, Heming Jia, Yuan Fang, Zhou Yu, Shaohui Lin

TL;DR

DF-LLaVA tackles the dual challenge of accurate synthetic image detection and interpretable artifact explanations by extracting latent discriminative knowledge from a vision encoder and injecting it into prompts to guide a large multimodal language model. The method freezes the vision encoder, trains a binary classifier on its [CLS] token, and uses the classifier’s probabilistic outputs as prompt-based knowledge, followed by fine-tuning the V/L projector and LLM on augmented data. Empirical results on FakeClue, LOKI, and DMImage C10 show that DF-LLaVA achieves state-of-the-art or near-state-of-the-art detection performance while delivering richer artifact explanations, surpassing both existing MLLMs and some expert baselines. The approach offers strong practical impact by enabling accurate, interpretable forensic analysis of AI-generated content and can be extended to other MLLM architectures via the proposed PGKI framework.

Abstract

With the increasing prevalence of synthetic images, evaluating image authenticity and locating forgeries accurately while maintaining human interpretability remains a challenging task. Existing detection models primarily focus on simple authenticity classification, ultimately providing only a forgery probability or binary judgment, which offers limited explanatory insights into image authenticity. Moreover, while MLLM-based detection methods can provide more interpretable results, they still lag behind expert models in terms of pure authenticity classification accuracy. To address this, we propose DF-LLaVA, a simple yet effective framework that unlocks the intrinsic discrimination potential of MLLMs. Our approach first extracts latent knowledge from MLLMs and then injects it into training via prompts. This framework allows LLaVA to achieve outstanding detection accuracy exceeding expert models while still maintaining the interpretability offered by MLLMs. Extensive experiments confirm the superiority of our DF-LLaVA, achieving both high accuracy and explainability in synthetic image detection. Code is available online at: https://github.com/Eliot-Shen/DF-LLaVA.

DF-LLaVA: Unlocking MLLM's potential for Synthetic Image Detection via Prompt-Guided Knowledge Injection

TL;DR

DF-LLaVA tackles the dual challenge of accurate synthetic image detection and interpretable artifact explanations by extracting latent discriminative knowledge from a vision encoder and injecting it into prompts to guide a large multimodal language model. The method freezes the vision encoder, trains a binary classifier on its [CLS] token, and uses the classifier’s probabilistic outputs as prompt-based knowledge, followed by fine-tuning the V/L projector and LLM on augmented data. Empirical results on FakeClue, LOKI, and DMImage C10 show that DF-LLaVA achieves state-of-the-art or near-state-of-the-art detection performance while delivering richer artifact explanations, surpassing both existing MLLMs and some expert baselines. The approach offers strong practical impact by enabling accurate, interpretable forensic analysis of AI-generated content and can be extended to other MLLM architectures via the proposed PGKI framework.

Abstract

With the increasing prevalence of synthetic images, evaluating image authenticity and locating forgeries accurately while maintaining human interpretability remains a challenging task. Existing detection models primarily focus on simple authenticity classification, ultimately providing only a forgery probability or binary judgment, which offers limited explanatory insights into image authenticity. Moreover, while MLLM-based detection methods can provide more interpretable results, they still lag behind expert models in terms of pure authenticity classification accuracy. To address this, we propose DF-LLaVA, a simple yet effective framework that unlocks the intrinsic discrimination potential of MLLMs. Our approach first extracts latent knowledge from MLLMs and then injects it into training via prompts. This framework allows LLaVA to achieve outstanding detection accuracy exceeding expert models while still maintaining the interpretability offered by MLLMs. Extensive experiments confirm the superiority of our DF-LLaVA, achieving both high accuracy and explainability in synthetic image detection. Code is available online at: https://github.com/Eliot-Shen/DF-LLaVA.

Paper Structure

This paper contains 10 sections, 2 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: DF-LLaVA provides comprehensive artifact-level interpretability with detection accuracy outperforming expert models.
  • Figure 2: Overview of DF-LLaVA during inference. DF-LLaVA leverages its frozen vision encoder via a binary classifier for initial authenticity estimation, injects its probabilistic output into prompts to enhance detection accuracy, and finally explain artifacts from various perspectives.
  • Figure 3: Overview of DF-LLaVA during training and inference. (a) In Stage 1, we adopt the UnivFD approach to train a binary classifier, whose predictions are injected into the train set as additional prompts. (b) In Stage 2, LLaVA is finetuned on this enriched dataset.