Table of Contents
Fetching ...

Towards Generalizable Forgery Detection and Reasoning

Yueying Gao, Dongliang Chang, Bingyao Yu, Haotian Qin, Muxi Diao, Lei Chen, Kongming Liang, Zhanyu Ma

TL;DR

Experiments across multiple generative models demonstrate that FakeReasoning not only achieves robust generalization but also outperforms state-of-the-art methods on both detection and reasoning tasks.

Abstract

Accurate and interpretable detection of AI-generated images is essential for mitigating risks associated with AI misuse. However, the substantial domain gap among generative models makes it challenging to develop a generalizable forgery detection model. Moreover, since every pixel in an AI-generated image is synthesized, traditional saliency-based forgery explanation methods are not well suited for this task. To address these challenges, we formulate detection and explanation as a unified Forgery Detection and Reasoning task (FDR-Task), leveraging Multi-Modal Large Language Models (MLLMs) to provide accurate detection through reliable reasoning over forgery attributes. To facilitate this task, we introduce the Multi-Modal Forgery Reasoning dataset (MMFR-Dataset), a large-scale dataset containing 120K images across 10 generative models, with 378K reasoning annotations on forgery attributes, enabling comprehensive evaluation of the FDR-Task. Furthermore, we propose FakeReasoning, a forgery detection and reasoning framework with three key components: 1) a dual-branch visual encoder that integrates CLIP and DINO to capture both high-level semantics and low-level artifacts; 2) a Forgery-Aware Feature Fusion Module that leverages DINO's attention maps and cross-attention mechanisms to guide MLLMs toward forgery-related clues; 3) a Classification Probability Mapper that couples language modeling and forgery detection, enhancing overall performance. Experiments across multiple generative models demonstrate that FakeReasoning not only achieves robust generalization but also outperforms state-of-the-art methods on both detection and reasoning tasks.

Towards Generalizable Forgery Detection and Reasoning

TL;DR

Experiments across multiple generative models demonstrate that FakeReasoning not only achieves robust generalization but also outperforms state-of-the-art methods on both detection and reasoning tasks.

Abstract

Accurate and interpretable detection of AI-generated images is essential for mitigating risks associated with AI misuse. However, the substantial domain gap among generative models makes it challenging to develop a generalizable forgery detection model. Moreover, since every pixel in an AI-generated image is synthesized, traditional saliency-based forgery explanation methods are not well suited for this task. To address these challenges, we formulate detection and explanation as a unified Forgery Detection and Reasoning task (FDR-Task), leveraging Multi-Modal Large Language Models (MLLMs) to provide accurate detection through reliable reasoning over forgery attributes. To facilitate this task, we introduce the Multi-Modal Forgery Reasoning dataset (MMFR-Dataset), a large-scale dataset containing 120K images across 10 generative models, with 378K reasoning annotations on forgery attributes, enabling comprehensive evaluation of the FDR-Task. Furthermore, we propose FakeReasoning, a forgery detection and reasoning framework with three key components: 1) a dual-branch visual encoder that integrates CLIP and DINO to capture both high-level semantics and low-level artifacts; 2) a Forgery-Aware Feature Fusion Module that leverages DINO's attention maps and cross-attention mechanisms to guide MLLMs toward forgery-related clues; 3) a Classification Probability Mapper that couples language modeling and forgery detection, enhancing overall performance. Experiments across multiple generative models demonstrate that FakeReasoning not only achieves robust generalization but also outperforms state-of-the-art methods on both detection and reasoning tasks.

Paper Structure

This paper contains 35 sections, 8 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Illustration of the FDR-Task. Different from traditional forgery detection, the FDR-Task leverages MLLMs to perform accurate detection through reliable reasoning over forgery attributes, improving both detection accuracy and interpretability.
  • Figure 2: Construction pipeline of the MMFR-Dataset. GPT-4o is tasked with caption generation and forgery interpretation. For the forgery interpretation task, the prompt is crafted to instruct GPT-4o to analysis forgery-related attributes. After inspected by human experts, the generated captions and interpretations are compiled with a chain-of-thought to enhance structured and hierarchical reasoning.
  • Figure 3: Statistics of the MMFR-Dataset. (a) Text length distribution of caption and reasoning stages; (b) Attributes distribution of real and fake images.
  • Figure 4: The pipeline of FakeReasoning. FakeReasoning adopts a dual-branch visual encoder combining CLIP and DINO to extract both high-level and low-level visual clues. Each encoder is followed by an adapter to align with the text embedding. The Forgery-Aware Feature Fusion module further fuses CLIP tokens and DINO tokens with cross-attention mechanism, leveraging DINO's attention maps as forgery-aware priors. The Classification Probability Mapper locates the classification token indicating the image authenticity from original logits and maps the vocabulary probability distribution into a classification score.
  • Figure 5: (a) Evaluation of the FDR-Task on LOKI benchmark. (b) Detection evaluation on LOKI benchmark. (c) Ablation study on the layers.
  • ...and 2 more figures