Insight-A: Attribution-aware for Multimodal Misinformation Detection
Junjie Wu, Yumeng Fu, Chen Gong, Guohong Fu
TL;DR
Insight-A tackles multimodal misinformation by introducing attribution-aware reasoning for MLLMs. It combines automatic attribution-debiased prompting (ADP), cross-attribution prompting (CAP), and image captioning (IC) to attribute content to forgery-generation patterns and ensure cross-modal consistency. On MMFakeBench, Insight-A achieves state-of-the-art zero-shot performance across multiclass and binary tasks, with ablations confirming the value of ADP, CAP, IC, and attribution. This work provides a practical, attribution-driven paradigm for debunking AI-generated multimodal misinformation in the era of AIGC.
Abstract
AI-generated content (AIGC) technology has emerged as a prevalent alternative to create multimodal misinformation on social media platforms, posing unprecedented threats to societal safety. However, standard prompting leverages multimodal large language models (MLLMs) to identify the emerging misinformation, which ignores the misinformation attribution. To this end, we present Insight-A, exploring attribution with MLLM insights for detecting multimodal misinformation. Insight-A makes two efforts: I) attribute misinformation to forgery sources, and II) an effective pipeline with hierarchical reasoning that detects distortions across modalities. Specifically, to attribute misinformation to forgery traces based on generation patterns, we devise cross-attribution prompting (CAP) to model the sophisticated correlations between perception and reasoning. Meanwhile, to reduce the subjectivity of human-annotated prompts, automatic attribution-debiased prompting (ADP) is used for task adaptation on MLLMs. Additionally, we design image captioning (IC) to achieve visual details for enhancing cross-modal consistency checking. Extensive experiments demonstrate the superiority of our proposal and provide a new paradigm for multimodal misinformation detection in the era of AIGC.
