Veri-R1: Toward Precise and Faithful Claim Verification via Online Reinforcement Learning
Qi He, Cheng Qian, Xiusi Chen, Bingxiang He, Yi R. Fung, Heng Ji
TL;DR
Veri-R1 tackles the challenge of online claim verification by training LLMs with online reinforcement learning to interact with a retrieval engine, enabling iterative planning, searching, and reasoning. The framework employs a unified rollout protocol, a multi-component reward design (format, evidence, label, and validity weight), and a GRPO-based optimization to promote faithful evidence gathering and correct judgments. Empirical results across FEVEROUS, EX-FEVER, FEVER, HOVER, and SciFACT show substantial gains in joint accuracy and evidence quality, with online RL often surpassing larger models and SFT baselines. The work advances practical, faithful verification by aligning training signals with real-world verification objectives and provides code to foster community progress.
Abstract
Claim verification with large language models (LLMs) has recently attracted growing attention, due to their strong reasoning capabilities and transparent verification processes compared to traditional answer-only judgments. However, existing approaches to online claim verification, which requires iterative evidence retrieval and reasoning, still mainly rely on prompt engineering or pre-designed reasoning workflows, without unified training to improve necessary skills. Therefore, we introduce Veri-R1, an online reinforcement learning (RL) framework that enables an LLM to interact with a search engine and to receive reward signals that explicitly shape its planning, retrieval, and reasoning behaviors. This dynamic interaction of LLM with retrieval systems more accurately reflects real-world verification scenarios and fosters comprehensive verification skills. Empirical results show that Veri-R1 improves joint accuracy by up to 30% and doubles the evidence score, often surpassing its larger-scale model counterparts. Ablation studies further reveal the impact of reward components, and the link between output logits and label accuracy. Our results highlight the effectiveness of online RL for precise and faithful claim verification, providing an important foundation for future research. We release our code to support community progress in LLM empowered claim verification.
