Table of Contents
Fetching ...

Veri-R1: Toward Precise and Faithful Claim Verification via Online Reinforcement Learning

Qi He, Cheng Qian, Xiusi Chen, Bingxiang He, Yi R. Fung, Heng Ji

TL;DR

Veri-R1 tackles the challenge of online claim verification by training LLMs with online reinforcement learning to interact with a retrieval engine, enabling iterative planning, searching, and reasoning. The framework employs a unified rollout protocol, a multi-component reward design (format, evidence, label, and validity weight), and a GRPO-based optimization to promote faithful evidence gathering and correct judgments. Empirical results across FEVEROUS, EX-FEVER, FEVER, HOVER, and SciFACT show substantial gains in joint accuracy and evidence quality, with online RL often surpassing larger models and SFT baselines. The work advances practical, faithful verification by aligning training signals with real-world verification objectives and provides code to foster community progress.

Abstract

Claim verification with large language models (LLMs) has recently attracted growing attention, due to their strong reasoning capabilities and transparent verification processes compared to traditional answer-only judgments. However, existing approaches to online claim verification, which requires iterative evidence retrieval and reasoning, still mainly rely on prompt engineering or pre-designed reasoning workflows, without unified training to improve necessary skills. Therefore, we introduce Veri-R1, an online reinforcement learning (RL) framework that enables an LLM to interact with a search engine and to receive reward signals that explicitly shape its planning, retrieval, and reasoning behaviors. This dynamic interaction of LLM with retrieval systems more accurately reflects real-world verification scenarios and fosters comprehensive verification skills. Empirical results show that Veri-R1 improves joint accuracy by up to 30% and doubles the evidence score, often surpassing its larger-scale model counterparts. Ablation studies further reveal the impact of reward components, and the link between output logits and label accuracy. Our results highlight the effectiveness of online RL for precise and faithful claim verification, providing an important foundation for future research. We release our code to support community progress in LLM empowered claim verification.

Veri-R1: Toward Precise and Faithful Claim Verification via Online Reinforcement Learning

TL;DR

Veri-R1 tackles the challenge of online claim verification by training LLMs with online reinforcement learning to interact with a retrieval engine, enabling iterative planning, searching, and reasoning. The framework employs a unified rollout protocol, a multi-component reward design (format, evidence, label, and validity weight), and a GRPO-based optimization to promote faithful evidence gathering and correct judgments. Empirical results across FEVEROUS, EX-FEVER, FEVER, HOVER, and SciFACT show substantial gains in joint accuracy and evidence quality, with online RL often surpassing larger models and SFT baselines. The work advances practical, faithful verification by aligning training signals with real-world verification objectives and provides code to foster community progress.

Abstract

Claim verification with large language models (LLMs) has recently attracted growing attention, due to their strong reasoning capabilities and transparent verification processes compared to traditional answer-only judgments. However, existing approaches to online claim verification, which requires iterative evidence retrieval and reasoning, still mainly rely on prompt engineering or pre-designed reasoning workflows, without unified training to improve necessary skills. Therefore, we introduce Veri-R1, an online reinforcement learning (RL) framework that enables an LLM to interact with a search engine and to receive reward signals that explicitly shape its planning, retrieval, and reasoning behaviors. This dynamic interaction of LLM with retrieval systems more accurately reflects real-world verification scenarios and fosters comprehensive verification skills. Empirical results show that Veri-R1 improves joint accuracy by up to 30% and doubles the evidence score, often surpassing its larger-scale model counterparts. Ablation studies further reveal the impact of reward components, and the link between output logits and label accuracy. Our results highlight the effectiveness of online RL for precise and faithful claim verification, providing an important foundation for future research. We release our code to support community progress in LLM empowered claim verification.

Paper Structure

This paper contains 57 sections, 10 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: Conceptual comparison of Offline Claim Verification and Online Claim Verification. In the offline setting, models are provided with both the claim and relevant evidence, requiring only reasoning to produce the final answer. In the online setting, models must iteratively retrieve relevant information from a corpus before reasoning and producing the final answer
  • Figure 2: Comprehensive framework of Veri-R1, depicting the Online Claim Verification (ONCV) and Offline Claim Verification (OFFCV) workflows together with the calculation of label, evidence, and format rewards.
  • Figure 3: To mitigate annotation-related issues and ambiguities in the raw dataset, we developed a pipeline that simulates offline rollout, using GPT-4o to filter and preserve only high-quality data.
  • Figure 4: System Prompt for Online Claim Verification.
  • Figure 5: Training curves from the ablation study. Panels (a)–(b) report evidence score under the evidence score ablation, while panels (c)–(d) show verification accuracy and panels (e)–(f) show evidence cover rate under the validity weight ablation.
  • ...and 6 more figures