Table of Contents
Fetching ...

Crafting the Path: Robust Query Rewriting for Information Retrieval

Ingeol Baek, Jimin Lee, Joonho Yang, Hwanhee Lee

TL;DR

This paper tackles open-domain information retrieval by addressing the fragility of LLM-driven query rewriting that leans on model-internal knowledge. It introduces Crafting The Path, a three-step, structured rewriting framework that prioritizes identifying what information to fetch (concept comprehension, information type, and expected answers) over generating passages. By constructing a final reformulated query $q^+$ and training retrieval with a Binary Passage Retrieval loss, the method achieves superior robustness and lower factual error rates, especially in domains with limited model knowledge, and demonstrates strong performance in retrieval-augmented generation tasks. The approach yields better out-of-domain results and reduced latency compared to prior methods, with analyses confirming the essential role of each step and highlighting practical implications for real-world QA systems.

Abstract

Query rewriting aims to generate a new query that can complement the original query to improve the information retrieval system. Recent studies on query rewriting, such as query2doc, query2expand and querey2cot, rely on the internal knowledge of Large Language Models (LLMs) to generate a relevant passage to add information to the query. Nevertheless, the efficacy of these methodologies may markedly decline in instances where the requisite knowledge is not encapsulated within the model's intrinsic parameters. In this paper, we propose a novel structured query rewriting method called Crafting the Path tailored for retrieval systems. Crafting the Path involves a three-step process that crafts query-related information necessary for finding the passages to be searched in each step. Specifically, the Crafting the Path begins with Query Concept Comprehension, proceeds to Query Type Identification, and finally conducts Expected Answer Extraction. Experimental results show that our method outperforms previous rewriting methods, especially in less familiar domains for LLMs. We demonstrate that our method is less dependent on the internal parameter knowledge of the model and generates queries with fewer factual inaccuracies. Furthermore, we observe that \name{} demonstrates superior performance in the retrieval-augmented generation scenarios.

Crafting the Path: Robust Query Rewriting for Information Retrieval

TL;DR

This paper tackles open-domain information retrieval by addressing the fragility of LLM-driven query rewriting that leans on model-internal knowledge. It introduces Crafting The Path, a three-step, structured rewriting framework that prioritizes identifying what information to fetch (concept comprehension, information type, and expected answers) over generating passages. By constructing a final reformulated query and training retrieval with a Binary Passage Retrieval loss, the method achieves superior robustness and lower factual error rates, especially in domains with limited model knowledge, and demonstrates strong performance in retrieval-augmented generation tasks. The approach yields better out-of-domain results and reduced latency compared to prior methods, with analyses confirming the essential role of each step and highlighting practical implications for real-world QA systems.

Abstract

Query rewriting aims to generate a new query that can complement the original query to improve the information retrieval system. Recent studies on query rewriting, such as query2doc, query2expand and querey2cot, rely on the internal knowledge of Large Language Models (LLMs) to generate a relevant passage to add information to the query. Nevertheless, the efficacy of these methodologies may markedly decline in instances where the requisite knowledge is not encapsulated within the model's intrinsic parameters. In this paper, we propose a novel structured query rewriting method called Crafting the Path tailored for retrieval systems. Crafting the Path involves a three-step process that crafts query-related information necessary for finding the passages to be searched in each step. Specifically, the Crafting the Path begins with Query Concept Comprehension, proceeds to Query Type Identification, and finally conducts Expected Answer Extraction. Experimental results show that our method outperforms previous rewriting methods, especially in less familiar domains for LLMs. We demonstrate that our method is less dependent on the internal parameter knowledge of the model and generates queries with fewer factual inaccuracies. Furthermore, we observe that \name{} demonstrates superior performance in the retrieval-augmented generation scenarios.
Paper Structure (29 sections, 8 equations, 3 figures, 13 tables)

This paper contains 29 sections, 8 equations, 3 figures, 13 tables.

Figures (3)

  • Figure 1: Overview of our proposed query rewriting method Crafting The Path, along with the rewritten query examples of query2doc (Q2D) and query2cot (Q2C) methodologies. We represent the factual error in red and the accurate information in blue.
  • Figure 2: The Retrieval-Augmented Generation performance of HotpotQA (left) and NaturalQA (right), when performing Crafting The Path (CTP), query2doc (Q2D), and query2cot (Q2C). K means the number of retrieved passages.
  • Figure 3: The length of the new query for each rewriting method in the MS-MARCO passage dev dataset.