LLatrieval: LLM-Verified Retrieval for Verifiable Generation
Xiaonan Li, Changtai Zhu, Linyang Li, Zhangyue Yin, Tianxiang Sun, Xipeng Qiu
TL;DR
This work tackles the bottleneck in verifiable generation by enabling the LLM to actively critique and refine retrieval through verify-update iterations, ensuring retrieved documents sufficiently support answers. The proposed LLatrieval framework uses classification and scoring to verify retrieval, progressive selection and missing-info queries to update the candidate set, and an iterative loop to converge on verifiable evidence. Empirical results on the ALCE benchmark demonstrate state-of-the-art gains in both correctness and verifiability across multiple datasets and LLMs, with analyses highlighting component contributions, threshold trade-offs, and cross-model generality. The approach offers a scalable path to more reliable, evidence-backed long-form generation, though it acknowledges latency and bias considerations as avenues for future improvement.
Abstract
Verifiable generation aims to let the large language model (LLM) generate text with supporting documents, which enables the user to flexibly verify the answer and makes the LLM's output more reliable. Retrieval plays a crucial role in verifiable generation. Specifically, the retrieved documents not only supplement knowledge to help the LLM generate correct answers, but also serve as supporting evidence for the user to verify the LLM's output. However, the widely used retrievers become the bottleneck of the entire pipeline and limit the overall performance. Their capabilities are usually inferior to LLMs since they often have much fewer parameters than the large language model and have not been demonstrated to scale well to the size of LLMs. If the retriever does not correctly find the supporting documents, the LLM can not generate the correct and verifiable answer, which overshadows the LLM's remarkable abilities. To address these limitations, we propose \LLatrieval (Large Language Model Verified Retrieval), where the LLM updates the retrieval result until it verifies that the retrieved documents can sufficiently support answering the question. Thus, the LLM can iteratively provide feedback to retrieval and facilitate the retrieval result to fully support verifiable generation. Experiments show that LLatrieval significantly outperforms extensive baselines and achieves state-of-the-art results.
