LLMs Know What They Need: Leveraging a Missing Information Guided Framework to Empower Retrieval-Augmented Generation
Keheng Wang, Feiyu Duan, Peiguang Li, Sirui Wang, Xunliang Cai
TL;DR
This paper tackles the challenge of retrieving relevant, non-noisy information for complex multi-hop queries in retrieval-augmented generation. It introduces MIGRES, a Missing Information Guided Retrieve-Extraction-Solving framework that identifies missing information at reasoning steps to generate targeted queries, coupled with sentence-level filtering and robust information extraction to improve factuality. The architecture comprises a Main module, a Retrieval module with query generation and re-ranking, a Leaf module for extraction with citations, and a Memory module for query diversification, all iteratively applied until a conclusive answer is produced. Empirical results across multiple datasets show MIGRES achieves superior or competitive performance under zero-shot conditions, with notable gains from sentence-level filtering and the ability to generate internal GPT knowledge when external retrieval is insufficient, while also reducing token consumption. Overall, MIGRES advances RAG by aligning retrieval with the model’s epistemic gaps, thereby reducing hallucinations and improving efficiency in knowledge-intensive tasks.
Abstract
Retrieval-Augmented Generation (RAG) demonstrates great value in alleviating outdated knowledge or hallucination by supplying LLMs with updated and relevant knowledge. However, there are still several difficulties for RAG in understanding complex multi-hop query and retrieving relevant documents, which require LLMs to perform reasoning and retrieve step by step. Inspired by human's reasoning process in which they gradually search for the required information, it is natural to ask whether the LLMs could notice the missing information in each reasoning step. In this work, we first experimentally verified the ability of LLMs to extract information as well as to know the missing. Based on the above discovery, we propose a Missing Information Guided Retrieve-Extraction-Solving paradigm (MIGRES), where we leverage the identification of missing information to generate a targeted query that steers the subsequent knowledge retrieval. Besides, we design a sentence-level re-ranking filtering approach to filter the irrelevant content out from document, along with the information extraction capability of LLMs to extract useful information from cleaned-up documents, which in turn to bolster the overall efficacy of RAG. Extensive experiments conducted on multiple public datasets reveal the superiority of the proposed MIGRES method, and analytical experiments demonstrate the effectiveness of our proposed modules.
