Table of Contents
Fetching ...

MMD-Thinker: Adaptive Multi-Dimensional Thinking for Multimodal Misinformation Detection

Junjie Wu, Guohong Fu

TL;DR

The paper tackles multimodal misinformation detection in the AI-generated content era by addressing insufficient domain-specific reasoning and single-mode biases in general-purpose MLLMs. It introduces MMD-Thinker, a two-stage framework with adaptive multi-dimensional thinking: designing thinking modes, learning them through task-specific instruction tuning, and optimizing reasoning trajectories via reinforcement learning with a mixed advantage $A_i = A_i^M + A_i^S$. A novel MMR dataset with 8K+ image-text pairs containing reasoning traces supports training and evaluation. Experiments on in-domain and out-of-domain benchmarks show state-of-the-art performance and flexible inference with efficient token usage; the approach offers a scalable path to reliable multimodal misinformation detection in the AI-generated content era, with learning objectives $L_1$ and $\mathcal{L}_2$ guiding the two-stage process.

Abstract

Multimodal misinformation floods on various social media, and continues to evolve in the era of AI-generated content (AIGC). The emerged misinformation with low creation cost and high deception poses significant threats to society. While recent studies leverage general-purpose multimodal large language models (MLLMs) to achieve remarkable results in detection, they encounter two critical limitations: (1) Insufficient reasoning, where general-purpose MLLMs often follow the uniform reasoning paradigm but generate inaccurate explanations and judgments, due to the lack of the task-specific knowledge of multimodal misinformation detection. (2) Reasoning biases, where a single thinking mode make detectors a suboptimal path for judgment, struggling to keep pace with the fast-growing and intricate multimodal misinformation. In this paper, we propose MMD-Thinker, a two-stage framework for multimodal misinformation detection through adaptive multi-dimensional thinking. First, we develop tailor-designed thinking mode for multimodal misinformation detection. Second, we adopt task-specific instruction tuning to inject the tailored thinking mode into general-purpose MLLMs. Third, we further leverage reinforcement learning strategy with a mixed advantage function, which incentivizes the reasoning capabilities in trajectories. Furthermore, we construct the multimodal misinformation reasoning (MMR) dataset, encompasses more than 8K image-text pairs with both reasoning processes and classification labels, to make progress in the relam of multimodal misinformation detection. Experimental results demonstrate that our proposed MMD-Thinker achieves state-of-the-art performance on both in-domain and out-of-domain benchmark datasets, while maintaining flexible inference and token usage. Code will be publicly available at Github.

MMD-Thinker: Adaptive Multi-Dimensional Thinking for Multimodal Misinformation Detection

TL;DR

The paper tackles multimodal misinformation detection in the AI-generated content era by addressing insufficient domain-specific reasoning and single-mode biases in general-purpose MLLMs. It introduces MMD-Thinker, a two-stage framework with adaptive multi-dimensional thinking: designing thinking modes, learning them through task-specific instruction tuning, and optimizing reasoning trajectories via reinforcement learning with a mixed advantage . A novel MMR dataset with 8K+ image-text pairs containing reasoning traces supports training and evaluation. Experiments on in-domain and out-of-domain benchmarks show state-of-the-art performance and flexible inference with efficient token usage; the approach offers a scalable path to reliable multimodal misinformation detection in the AI-generated content era, with learning objectives and guiding the two-stage process.

Abstract

Multimodal misinformation floods on various social media, and continues to evolve in the era of AI-generated content (AIGC). The emerged misinformation with low creation cost and high deception poses significant threats to society. While recent studies leverage general-purpose multimodal large language models (MLLMs) to achieve remarkable results in detection, they encounter two critical limitations: (1) Insufficient reasoning, where general-purpose MLLMs often follow the uniform reasoning paradigm but generate inaccurate explanations and judgments, due to the lack of the task-specific knowledge of multimodal misinformation detection. (2) Reasoning biases, where a single thinking mode make detectors a suboptimal path for judgment, struggling to keep pace with the fast-growing and intricate multimodal misinformation. In this paper, we propose MMD-Thinker, a two-stage framework for multimodal misinformation detection through adaptive multi-dimensional thinking. First, we develop tailor-designed thinking mode for multimodal misinformation detection. Second, we adopt task-specific instruction tuning to inject the tailored thinking mode into general-purpose MLLMs. Third, we further leverage reinforcement learning strategy with a mixed advantage function, which incentivizes the reasoning capabilities in trajectories. Furthermore, we construct the multimodal misinformation reasoning (MMR) dataset, encompasses more than 8K image-text pairs with both reasoning processes and classification labels, to make progress in the relam of multimodal misinformation detection. Experimental results demonstrate that our proposed MMD-Thinker achieves state-of-the-art performance on both in-domain and out-of-domain benchmark datasets, while maintaining flexible inference and token usage. Code will be publicly available at Github.

Paper Structure

This paper contains 11 sections, 5 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Illustrations of different methods in the task of multimodal misinformation detection.
  • Figure 2: Overview of the MMD-Thinker framework for multimodal misinformation detection. The framework consists of three critical modules: Multi-dimensional thinking mode design, multi-dimensional thinking mode learning, and adaptive multi-dimensional thinking mode policy optimization.