Beyond Detection: Exploring Evidence-based Multi-Agent Debate for Misinformation Intervention and Persuasion
Chen Han, Yijia Ma, Jin Tan, Wenzhen Zheng, Xijin Tang
TL;DR
This work tackles misinformation by introducing ED2D, an evidence-based multi-agent debate framework that integrates external factual retrieval into adversarial reasoning and uses persuasive debunking to influence user beliefs. ED2D combines a five-stage MAD with a Wikipedia-based evidence pipeline to ground arguments and produce transparent, debatable explanations, achieving strong detection performance and interpretability across three benchmarks, including the real-world Snopes25 dataset. A controlled human-subject study demonstrates that ED2D can be as persuasive as expert fact-checks when correct, but also reveals risks where incorrect AI outputs can mislead users, underscoring the need for safeguards. A public platform accompanies ED2D to promote transparency, epistemic vigilance, and collaborative fact-checking, highlighting both practical utility and safety considerations for deploying persuasive AI in misinformation intervention.
Abstract
Multi-agent debate (MAD) frameworks have emerged as promising approaches for misinformation detection by simulating adversarial reasoning. While prior work has focused on detection accuracy, it overlooks the importance of helping users understand the reasoning behind factual judgments and develop future resilience. The debate transcripts generated during MAD offer a rich but underutilized resource for transparent reasoning. In this study, we introduce ED2D, an evidence-based MAD framework that extends previous approach by incorporating factual evidence retrieval. More importantly, ED2D is designed not only as a detection framework but also as a persuasive multi-agent system aimed at correcting user beliefs and discouraging misinformation sharing. We compare the persuasive effects of ED2D-generated debunking transcripts with those authored by human experts. Results demonstrate that ED2D outperforms existing baselines across three misinformation detection benchmarks. When ED2D generates correct predictions, its debunking transcripts exhibit persuasive effects comparable to those of human experts; However, when ED2D misclassifies, its accompanying explanations may inadvertently reinforce users'misconceptions, even when presented alongside accurate human explanations. Our findings highlight both the promise and the potential risks of deploying MAD systems for misinformation intervention. We further develop a public community website to help users explore ED2D, fostering transparency, critical thinking, and collaborative fact-checking.
