Table of Contents
Fetching ...

Insight-A: Attribution-aware for Multimodal Misinformation Detection

Junjie Wu, Yumeng Fu, Chen Gong, Guohong Fu

TL;DR

Insight-A tackles multimodal misinformation by introducing attribution-aware reasoning for MLLMs. It combines automatic attribution-debiased prompting (ADP), cross-attribution prompting (CAP), and image captioning (IC) to attribute content to forgery-generation patterns and ensure cross-modal consistency. On MMFakeBench, Insight-A achieves state-of-the-art zero-shot performance across multiclass and binary tasks, with ablations confirming the value of ADP, CAP, IC, and attribution. This work provides a practical, attribution-driven paradigm for debunking AI-generated multimodal misinformation in the era of AIGC.

Abstract

AI-generated content (AIGC) technology has emerged as a prevalent alternative to create multimodal misinformation on social media platforms, posing unprecedented threats to societal safety. However, standard prompting leverages multimodal large language models (MLLMs) to identify the emerging misinformation, which ignores the misinformation attribution. To this end, we present Insight-A, exploring attribution with MLLM insights for detecting multimodal misinformation. Insight-A makes two efforts: I) attribute misinformation to forgery sources, and II) an effective pipeline with hierarchical reasoning that detects distortions across modalities. Specifically, to attribute misinformation to forgery traces based on generation patterns, we devise cross-attribution prompting (CAP) to model the sophisticated correlations between perception and reasoning. Meanwhile, to reduce the subjectivity of human-annotated prompts, automatic attribution-debiased prompting (ADP) is used for task adaptation on MLLMs. Additionally, we design image captioning (IC) to achieve visual details for enhancing cross-modal consistency checking. Extensive experiments demonstrate the superiority of our proposal and provide a new paradigm for multimodal misinformation detection in the era of AIGC.

Insight-A: Attribution-aware for Multimodal Misinformation Detection

TL;DR

Insight-A tackles multimodal misinformation by introducing attribution-aware reasoning for MLLMs. It combines automatic attribution-debiased prompting (ADP), cross-attribution prompting (CAP), and image captioning (IC) to attribute content to forgery-generation patterns and ensure cross-modal consistency. On MMFakeBench, Insight-A achieves state-of-the-art zero-shot performance across multiclass and binary tasks, with ablations confirming the value of ADP, CAP, IC, and attribution. This work provides a practical, attribution-driven paradigm for debunking AI-generated multimodal misinformation in the era of AIGC.

Abstract

AI-generated content (AIGC) technology has emerged as a prevalent alternative to create multimodal misinformation on social media platforms, posing unprecedented threats to societal safety. However, standard prompting leverages multimodal large language models (MLLMs) to identify the emerging misinformation, which ignores the misinformation attribution. To this end, we present Insight-A, exploring attribution with MLLM insights for detecting multimodal misinformation. Insight-A makes two efforts: I) attribute misinformation to forgery sources, and II) an effective pipeline with hierarchical reasoning that detects distortions across modalities. Specifically, to attribute misinformation to forgery traces based on generation patterns, we devise cross-attribution prompting (CAP) to model the sophisticated correlations between perception and reasoning. Meanwhile, to reduce the subjectivity of human-annotated prompts, automatic attribution-debiased prompting (ADP) is used for task adaptation on MLLMs. Additionally, we design image captioning (IC) to achieve visual details for enhancing cross-modal consistency checking. Extensive experiments demonstrate the superiority of our proposal and provide a new paradigm for multimodal misinformation detection in the era of AIGC.

Paper Structure

This paper contains 29 sections, 5 equations, 4 figures, 9 tables.

Figures (4)

  • Figure 1: Misinformation detection methods (a) and (b) vs our Insight-A (c). Previous methods either train dedicated detectors, or use standard prompting to combat misinformation. In contrast, our Insight-A attributes multimodal content to generation patterns, thus making accurate decisions.
  • Figure 2: The overall architecture of Insight-A, which consists of automatic attribution-debiased prompting, cross-attribution prompting, and image captioning.
  • Figure 3: Performance (detection success rate) of different methods on two generation categories.
  • Figure 4: The qualitative results of Insight-A.