Table of Contents
Fetching ...

Disentangling Questions from Query Generation for Task-Adaptive Retrieval

Yoonsang Lee, Minsoo Kim, Seung-won Hwang

TL;DR

The paper tackles unseen-task information retrieval by decoupling query generation from the traditional question form and introducing task-adaptive queries via meta-prompts. It presents EGG, an Efficient Generalized Generator, with two variants (EGG-FLAN and EGG-LLAMA) that incorporate explicit search intents and, when needed, prototype queries for in-context learning. Trained with either DPR or GPL on BeIR-derived synthetic data, EGG significantly outperforms baselines on four BeIR tasks while using a query generator far smaller than prior state-of-the-art (approximately 47x smaller). The results demonstrate the importance of explicit search-intent instructions and demonstrate that compact, task-aware query generation can robustly cover diverse intents and improve retrieval in practical settings.

Abstract

This paper studies the problem of information retrieval, to adapt to unseen tasks. Existing work generates synthetic queries from domain-specific documents to jointly train the retriever. However, the conventional query generator assumes the query as a question, thus failing to accommodate general search intents. A more lenient approach incorporates task-adaptive elements, such as few-shot learning with an 137B LLM. In this paper, we challenge a trend equating query and question, and instead conceptualize query generation task as a "compilation" of high-level intent into task-adaptive query. Specifically, we propose EGG, a query generator that better adapts to wide search intents expressed in the BeIR benchmark. Our method outperforms baselines and existing models on four tasks with underexplored intents, while utilizing a query generator 47 times smaller than the previous state-of-the-art. Our findings reveal that instructing the LM with explicit search intent is a key aspect of modeling an effective query generator.

Disentangling Questions from Query Generation for Task-Adaptive Retrieval

TL;DR

The paper tackles unseen-task information retrieval by decoupling query generation from the traditional question form and introducing task-adaptive queries via meta-prompts. It presents EGG, an Efficient Generalized Generator, with two variants (EGG-FLAN and EGG-LLAMA) that incorporate explicit search intents and, when needed, prototype queries for in-context learning. Trained with either DPR or GPL on BeIR-derived synthetic data, EGG significantly outperforms baselines on four BeIR tasks while using a query generator far smaller than prior state-of-the-art (approximately 47x smaller). The results demonstrate the importance of explicit search-intent instructions and demonstrate that compact, task-aware query generation can robustly cover diverse intents and improve retrieval in practical settings.

Abstract

This paper studies the problem of information retrieval, to adapt to unseen tasks. Existing work generates synthetic queries from domain-specific documents to jointly train the retriever. However, the conventional query generator assumes the query as a question, thus failing to accommodate general search intents. A more lenient approach incorporates task-adaptive elements, such as few-shot learning with an 137B LLM. In this paper, we challenge a trend equating query and question, and instead conceptualize query generation task as a "compilation" of high-level intent into task-adaptive query. Specifically, we propose EGG, a query generator that better adapts to wide search intents expressed in the BeIR benchmark. Our method outperforms baselines and existing models on four tasks with underexplored intents, while utilizing a query generator 47 times smaller than the previous state-of-the-art. Our findings reveal that instructing the LM with explicit search intent is a key aspect of modeling an effective query generator.
Paper Structure (21 sections, 1 figure, 8 tables)

This paper contains 21 sections, 1 figure, 8 tables.

Figures (1)

  • Figure 1: Overview of our method. Given a document, conventional zero-shot query generator generates questions, while few-shot query generator performs in-context learning with few-shot examples. In contrast, our method reflects diverse search intents utilizing meta-prompts to enhance the generalizability of the query generator. Fact checking can be viewed as transaction, where the retriever determines whether the given claim is supported or not, and argument retrieval is similarly so. Citation prediction, presenting a title as the query, represents a navigational intent for a specific document, and entity retrieval exhibits a mixture of the intents.