Table of Contents
Fetching ...

T^2Agent A Tool-augmented Multimodal Misinformation Detection Agent with Monte Carlo Tree Search

Xing Cui, Yueying Zou, Zekun Li, Peipei Li, Xinyuan Xu, Xuannan Liu, Huaibo Huang

TL;DR

T^2Agent addresses the challenge of real-world multimodal misinformation arising from mixed forgery sources by integrating an extensible toolkit with Monte Carlo Tree Search (MCTS) to perform dynamic, multi-source verification. A greedy tool selector forms a task-specific action subset, while a multi-source verification extension of MCTS decomposes the task into subtasks for different forgery sources and uses a dual reward to balance exploration and evidence exploitation. Empirical results on MMFakeBench and AMG show substantial gains over baselines and validate training-free effectiveness, with notable improvements in accuracy and macro-F1 and favorable cost-performance trade-offs in certain settings. The work demonstrates a scalable, adaptable framework for misinformation detection that can generalize to emerging forgery types without additional training, enhancing practical deployment potential.

Abstract

Real-world multimodal misinformation often arises from mixed forgery sources, requiring dynamic reasoning and adaptive verification. However, existing methods mainly rely on static pipelines and limited tool usage, limiting their ability to handle such complexity and diversity. To address this challenge, we propose \method, a novel misinformation detection agent that incorporates an extensible toolkit with Monte Carlo Tree Search (MCTS). The toolkit consists of modular tools such as web search, forgery detection, and consistency analysis. Each tool is described using standardized templates, enabling seamless integration and future expansion. To avoid inefficiency from using all tools simultaneously, a greedy search-based selector is proposed to identify a task-relevant subset. This subset then serves as the action space for MCTS to dynamically collect evidence and perform multi-source verification. To better align MCTS with the multi-source nature of misinformation detection, \method~ extends traditional MCTS with multi-source verification, which decomposes the task into coordinated subtasks targeting different forgery sources. A dual reward mechanism containing a reasoning trajectory score and a confidence score is further proposed to encourage a balance between exploration across mixed forgery sources and exploitation for more reliable evidence. We conduct ablation studies to confirm the effectiveness of the tree search mechanism and tool usage. Extensive experiments further show that \method~ consistently outperforms existing baselines on challenging mixed-source multimodal misinformation benchmarks, demonstrating its strong potential as a training-free detector.

T^2Agent A Tool-augmented Multimodal Misinformation Detection Agent with Monte Carlo Tree Search

TL;DR

T^2Agent addresses the challenge of real-world multimodal misinformation arising from mixed forgery sources by integrating an extensible toolkit with Monte Carlo Tree Search (MCTS) to perform dynamic, multi-source verification. A greedy tool selector forms a task-specific action subset, while a multi-source verification extension of MCTS decomposes the task into subtasks for different forgery sources and uses a dual reward to balance exploration and evidence exploitation. Empirical results on MMFakeBench and AMG show substantial gains over baselines and validate training-free effectiveness, with notable improvements in accuracy and macro-F1 and favorable cost-performance trade-offs in certain settings. The work demonstrates a scalable, adaptable framework for misinformation detection that can generalize to emerging forgery types without additional training, enhancing practical deployment potential.

Abstract

Real-world multimodal misinformation often arises from mixed forgery sources, requiring dynamic reasoning and adaptive verification. However, existing methods mainly rely on static pipelines and limited tool usage, limiting their ability to handle such complexity and diversity. To address this challenge, we propose \method, a novel misinformation detection agent that incorporates an extensible toolkit with Monte Carlo Tree Search (MCTS). The toolkit consists of modular tools such as web search, forgery detection, and consistency analysis. Each tool is described using standardized templates, enabling seamless integration and future expansion. To avoid inefficiency from using all tools simultaneously, a greedy search-based selector is proposed to identify a task-relevant subset. This subset then serves as the action space for MCTS to dynamically collect evidence and perform multi-source verification. To better align MCTS with the multi-source nature of misinformation detection, \method~ extends traditional MCTS with multi-source verification, which decomposes the task into coordinated subtasks targeting different forgery sources. A dual reward mechanism containing a reasoning trajectory score and a confidence score is further proposed to encourage a balance between exploration across mixed forgery sources and exploitation for more reliable evidence. We conduct ablation studies to confirm the effectiveness of the tree search mechanism and tool usage. Extensive experiments further show that \method~ consistently outperforms existing baselines on challenging mixed-source multimodal misinformation benchmarks, demonstrating its strong potential as a training-free detector.

Paper Structure

This paper contains 22 sections, 7 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: (1) MMDAgent adopts a fixed verification process. (2) Our T$^2$Agent builds a multi-source verification framework inspired by MCTS, enabling dynamic verification through adaptive tool selection and evidence integration.
  • Figure 2: Overview of T$^2$Agent. The toolkit acts as the action space, with a greedy search selecting relevant tools. T$^2$Agent extends MCTS via multi-source verification, breaking tasks into subtasks targeting different forgery sources. At each node of the tree search process, the agent plans verification paths, selects tools based on task requirements, and evaluates outcomes using a dual reward function that balances exploration across forgery sources with evidence exploitation.
  • Figure :
  • Figure :