Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning
Fanrui Zhang, Dian Li, Qiang Zhang, Jun Chen, Gang Liu, Junxiong Lin, Jiahong Yan, Jiawei Liu, Zheng-Jun Zha
TL;DR
This work tackles the challenge of video misinformation detection by releasing FakeVV, a large-scale, richly annotated video-text benchmark, and proposing Fact-R1, a reasoning-enhanced detector that unites deep multimodal reasoning with collaborative rule-based reinforcement learning. Fact-R1 is trained in three stages—long-CoT instruction tuning, Direct Preference Optimization, and Group Relative Policy Optimization with a verifiable reward function—enabling emergent, explainable reasoning about manipulated entities in video content. Empirical results show Fact-R1 achieving state-of-the-art performance across three short-video misinformation datasets, with ablations and explainability analyses demonstrating the importance of staged training, reward design, and auxiliary tasks for robust reasoning. The work presents a new paradigm that bridges large-scale video understanding, reasoning-guided alignment, and verifiable explainability, with potential to assist human fact-checkers while highlighting considerations for safe, responsible deployment.
Abstract
The rapid spread of multimodal misinformation on social media has raised growing concerns, while research on video misinformation detection remains limited due to the lack of large-scale, diverse datasets. Existing methods often overfit to rigid templates and lack deep reasoning over deceptive content. To address these challenges, we introduce FakeVV, a large-scale benchmark comprising over 100,000 video-text pairs with fine-grained, interpretable annotations. In addition, we further propose Fact-R1, a novel framework that integrates deep reasoning with collaborative rule-based reinforcement learning. Fact-R1 is trained through a three-stage process: (1) misinformation long-Chain-of-Thought (CoT) instruction tuning, (2) preference alignment via Direct Preference Optimization (DPO), and (3) Group Relative Policy Optimization (GRPO) using a novel verifiable reward function. This enables Fact-R1 to exhibit emergent reasoning behaviors comparable to those observed in advanced text-based reinforcement learning systems, but in the more complex multimodal misinformation setting. Our work establishes a new paradigm for misinformation detection, bridging large-scale video understanding, reasoning-guided alignment, and interpretable verification.
