Table of Contents
Fetching ...

Towards Explainable Fake Image Detection with Multi-Modal Large Language Models

Yikun Ji, Yan Hong, Jiahui Zhan, Haoxing Chen, jun lan, Huijia Zhu, Weiqiang Wang, Liqing Zhang, Jianfu Zhang

TL;DR

This work addresses the need for explainable AI-generated image detection by leveraging multi-modal large language models (MLLMs). It introduces a structured interrogation framework built from six prompts (P0–P6) and a fusion process to reason about real versus AI-generated images, enhancing both accuracy and interpretability. Through extensive experiments on a diverse 2000-image dataset, GPT-4o-based fusion achieves up to 93.4% accuracy, surpassing traditional detectors and the average human, while also providing transparent justifications. The findings demonstrate the potential of combining diverse reasoning paradigms with MLLMs to deliver robust, explainable detection, though they also highlight limitations such as prompt sensitivity and rejection behavior that warrant further refinement.

Abstract

Progress in image generation raises significant public security concerns. We argue that fake image detection should not operate as a "black box". Instead, an ideal approach must ensure both strong generalization and transparency. Recent progress in Multi-modal Large Language Models (MLLMs) offers new opportunities for reasoning-based AI-generated image detection. In this work, we evaluate the capabilities of MLLMs in comparison to traditional detection methods and human evaluators, highlighting their strengths and limitations. Furthermore, we design six distinct prompts and propose a framework that integrates these prompts to develop a more robust, explainable, and reasoning-driven detection system. The code is available at https://github.com/Gennadiyev/mllm-defake.

Towards Explainable Fake Image Detection with Multi-Modal Large Language Models

TL;DR

This work addresses the need for explainable AI-generated image detection by leveraging multi-modal large language models (MLLMs). It introduces a structured interrogation framework built from six prompts (P0–P6) and a fusion process to reason about real versus AI-generated images, enhancing both accuracy and interpretability. Through extensive experiments on a diverse 2000-image dataset, GPT-4o-based fusion achieves up to 93.4% accuracy, surpassing traditional detectors and the average human, while also providing transparent justifications. The findings demonstrate the potential of combining diverse reasoning paradigms with MLLMs to deliver robust, explainable detection, though they also highlight limitations such as prompt sensitivity and rejection behavior that warrant further refinement.

Abstract

Progress in image generation raises significant public security concerns. We argue that fake image detection should not operate as a "black box". Instead, an ideal approach must ensure both strong generalization and transparency. Recent progress in Multi-modal Large Language Models (MLLMs) offers new opportunities for reasoning-based AI-generated image detection. In this work, we evaluate the capabilities of MLLMs in comparison to traditional detection methods and human evaluators, highlighting their strengths and limitations. Furthermore, we design six distinct prompts and propose a framework that integrates these prompts to develop a more robust, explainable, and reasoning-driven detection system. The code is available at https://github.com/Gennadiyev/mllm-defake.

Paper Structure

This paper contains 30 sections, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Traditional methods do not provide reasons for their predictions, while our method provides sensible reasons behind the verdict.
  • Figure 2: The overall design of the proposed MLLM-based AI-generated image detection framework.
  • Figure 3: (a) Examples where GPT-4o fusion gives correct result, while CNNSpot, AEROBLADE, and vanilla GPT-4o (P0) fail. (b) A fusion example. GPT-4o can combine responses from P1-6 effectively, drawing conclusions from reasons instead of verdicts.
  • Figure 4: Percentage of cases where prompts give different verdicts. The data is aggregated from all four models evaluated.
  • Figure 5: Distribution of AI generation methods among images in the dataset.
  • ...and 4 more figures