Disentangling Questions from Query Generation for Task-Adaptive Retrieval
Yoonsang Lee, Minsoo Kim, Seung-won Hwang
TL;DR
The paper tackles unseen-task information retrieval by decoupling query generation from the traditional question form and introducing task-adaptive queries via meta-prompts. It presents EGG, an Efficient Generalized Generator, with two variants (EGG-FLAN and EGG-LLAMA) that incorporate explicit search intents and, when needed, prototype queries for in-context learning. Trained with either DPR or GPL on BeIR-derived synthetic data, EGG significantly outperforms baselines on four BeIR tasks while using a query generator far smaller than prior state-of-the-art (approximately 47x smaller). The results demonstrate the importance of explicit search-intent instructions and demonstrate that compact, task-aware query generation can robustly cover diverse intents and improve retrieval in practical settings.
Abstract
This paper studies the problem of information retrieval, to adapt to unseen tasks. Existing work generates synthetic queries from domain-specific documents to jointly train the retriever. However, the conventional query generator assumes the query as a question, thus failing to accommodate general search intents. A more lenient approach incorporates task-adaptive elements, such as few-shot learning with an 137B LLM. In this paper, we challenge a trend equating query and question, and instead conceptualize query generation task as a "compilation" of high-level intent into task-adaptive query. Specifically, we propose EGG, a query generator that better adapts to wide search intents expressed in the BeIR benchmark. Our method outperforms baselines and existing models on four tasks with underexplored intents, while utilizing a query generator 47 times smaller than the previous state-of-the-art. Our findings reveal that instructing the LM with explicit search intent is a key aspect of modeling an effective query generator.
