Table of Contents
Fetching ...

LLatrieval: LLM-Verified Retrieval for Verifiable Generation

Xiaonan Li, Changtai Zhu, Linyang Li, Zhangyue Yin, Tianxiang Sun, Xipeng Qiu

TL;DR

This work tackles the bottleneck in verifiable generation by enabling the LLM to actively critique and refine retrieval through verify-update iterations, ensuring retrieved documents sufficiently support answers. The proposed LLatrieval framework uses classification and scoring to verify retrieval, progressive selection and missing-info queries to update the candidate set, and an iterative loop to converge on verifiable evidence. Empirical results on the ALCE benchmark demonstrate state-of-the-art gains in both correctness and verifiability across multiple datasets and LLMs, with analyses highlighting component contributions, threshold trade-offs, and cross-model generality. The approach offers a scalable path to more reliable, evidence-backed long-form generation, though it acknowledges latency and bias considerations as avenues for future improvement.

Abstract

Verifiable generation aims to let the large language model (LLM) generate text with supporting documents, which enables the user to flexibly verify the answer and makes the LLM's output more reliable. Retrieval plays a crucial role in verifiable generation. Specifically, the retrieved documents not only supplement knowledge to help the LLM generate correct answers, but also serve as supporting evidence for the user to verify the LLM's output. However, the widely used retrievers become the bottleneck of the entire pipeline and limit the overall performance. Their capabilities are usually inferior to LLMs since they often have much fewer parameters than the large language model and have not been demonstrated to scale well to the size of LLMs. If the retriever does not correctly find the supporting documents, the LLM can not generate the correct and verifiable answer, which overshadows the LLM's remarkable abilities. To address these limitations, we propose \LLatrieval (Large Language Model Verified Retrieval), where the LLM updates the retrieval result until it verifies that the retrieved documents can sufficiently support answering the question. Thus, the LLM can iteratively provide feedback to retrieval and facilitate the retrieval result to fully support verifiable generation. Experiments show that LLatrieval significantly outperforms extensive baselines and achieves state-of-the-art results.

LLatrieval: LLM-Verified Retrieval for Verifiable Generation

TL;DR

This work tackles the bottleneck in verifiable generation by enabling the LLM to actively critique and refine retrieval through verify-update iterations, ensuring retrieved documents sufficiently support answers. The proposed LLatrieval framework uses classification and scoring to verify retrieval, progressive selection and missing-info queries to update the candidate set, and an iterative loop to converge on verifiable evidence. Empirical results on the ALCE benchmark demonstrate state-of-the-art gains in both correctness and verifiability across multiple datasets and LLMs, with analyses highlighting component contributions, threshold trade-offs, and cross-model generality. The approach offers a scalable path to more reliable, evidence-backed long-form generation, though it acknowledges latency and bias considerations as avenues for future improvement.

Abstract

Verifiable generation aims to let the large language model (LLM) generate text with supporting documents, which enables the user to flexibly verify the answer and makes the LLM's output more reliable. Retrieval plays a crucial role in verifiable generation. Specifically, the retrieved documents not only supplement knowledge to help the LLM generate correct answers, but also serve as supporting evidence for the user to verify the LLM's output. However, the widely used retrievers become the bottleneck of the entire pipeline and limit the overall performance. Their capabilities are usually inferior to LLMs since they often have much fewer parameters than the large language model and have not been demonstrated to scale well to the size of LLMs. If the retriever does not correctly find the supporting documents, the LLM can not generate the correct and verifiable answer, which overshadows the LLM's remarkable abilities. To address these limitations, we propose \LLatrieval (Large Language Model Verified Retrieval), where the LLM updates the retrieval result until it verifies that the retrieved documents can sufficiently support answering the question. Thus, the LLM can iteratively provide feedback to retrieval and facilitate the retrieval result to fully support verifiable generation. Experiments show that LLatrieval significantly outperforms extensive baselines and achieves state-of-the-art results.
Paper Structure (35 sections, 5 equations, 7 figures, 12 tables, 1 algorithm)

This paper contains 35 sections, 5 equations, 7 figures, 12 tables, 1 algorithm.

Figures (7)

  • Figure 1: Verifiable Generation alce
  • Figure 2: When the vanilla retrieval overshadows the LLM's remarkable abilities in the pipeline, LLatrieval can fully harness the LLM's abilities to the retrieval by verify-update iterations.
  • Figure 3: Retrieval Verification: We propose two ways to verify whether the documents can support answer the question. Missing-Info Query: We propose to let the LLM generate missing-info query in two styles.
  • Figure 4: Retrieval update by progressive selection.
  • Figure 5: The generation quality of filtered examples over different thresholds.
  • ...and 2 more figures