Reinforcement Retrieval Leveraging Fine-grained Feedback for Fact Checking News Claims with Black-Box LLM

Xuan Zhang; Wei Gao

Reinforcement Retrieval Leveraging Fine-grained Feedback for Fact Checking News Claims with Black-Box LLM

Xuan Zhang, Wei Gao

TL;DR

This work tackles the difficulty of training retrieval models when using black-box LLMs for fact-checking by introducing Fine-grained Feedback with Reinforcement Retrieval (FFRR). FFRR uses a two-level reward framework—document-level and question-level—to guide a dense retrieval policy via policy gradient, incorporating intermediate LLM feedback to improve evidence selection. Across two public datasets, FFRR substantially outperforms strong LLM-enabled and non-LLM baselines, with the hybrid document-and-question approach yielding the best results. The approach enables effective exploration beyond top-$K$ retrieval while keeping inference overhead low, offering practical gains for real-world news verification tasks.

Abstract

Retrieval-augmented language models have exhibited promising performance across various areas of natural language processing (NLP), including fact-critical tasks. However, due to the black-box nature of advanced large language models (LLMs) and the non-retrieval-oriented supervision signal of specific tasks, the training of retrieval model faces significant challenges under the setting of black-box LLM. We propose an approach leveraging Fine-grained Feedback with Reinforcement Retrieval (FFRR) to enhance fact-checking on news claims by using black-box LLM. FFRR adopts a two-level strategy to gather fine-grained feedback from the LLM, which serves as a reward for optimizing the retrieval policy, by rating the retrieved documents based on the non-retrieval ground truth of the task. We evaluate our model on two public datasets for real-world news claim verification, and the results demonstrate that FFRR achieves significant improvements over strong LLM-enabled and non-LLM baselines.

Reinforcement Retrieval Leveraging Fine-grained Feedback for Fact Checking News Claims with Black-Box LLM

TL;DR

retrieval while keeping inference overhead low, offering practical gains for real-world news verification tasks.

Abstract

Paper Structure (26 sections, 3 equations, 5 figures, 5 tables)

This paper contains 26 sections, 3 equations, 5 figures, 5 tables.

Introduction
Related Work
Problem Definition
Our Methodology
Document-level Retrieval Policy
Training Rewards.
Inference.
Prompting LLM.
Question-level Retrieval Policy
Retrieval Policy with Hybrid Rewards
Optimizing Retrieval Policy
Experiments and Results
Experimental Setup
Datasets.
Implementation Details.
...and 11 more sections

Figures (5)

Figure 1: The black-box LLM handicaps end-to-end task training through gradient backward. However, manual (sparse) and LLM (fine-grained) feedback can be used to optimize the retrieval model.
Figure 2: FFRR policy models with different fine-grained rewards from LLM. In Figure \ref{['fig:model_0']}, $d_i$ is shaded to indicate that it is the relevant document that potentially aids in verifying the truthfulness of the claim.
Figure 3: FFRR with two-level policies combined.
Figure 4: Effect of number of documents $K$.
Figure 5: Effect of final reward $r_g$.

Reinforcement Retrieval Leveraging Fine-grained Feedback for Fact Checking News Claims with Black-Box LLM

TL;DR

Abstract

Reinforcement Retrieval Leveraging Fine-grained Feedback for Fact Checking News Claims with Black-Box LLM

Authors

TL;DR

Abstract

Table of Contents

Figures (5)