QUESTER: Query Specification for Generative Retrieval
Arthur Satouf, Yuxuan Zong, Habiboulaye Amadou-Boubacar, Pablo Piantanida, Benjamin Piwowarski
TL;DR
QueStER reframes Generative Retrieval by learning to produce keyword-based query specifications that are processed by a BM25 engine, addressing scaling and generalization challenges of DocID-based GR approaches. It trains a small LLM with Group-Relative Policy Optimization (GRPO) using a SoftRank-based reward and cross-encoder distillation to guide learning, enabling effective and efficient retrieval. Empirical results on MS MARCO and BEIR show QueStER outperforming BM25 in both in-domain and out-of-domain settings, with a favorable latency around 28 ms per query when using a 4B backbone. By leveraging established search technologies and providing interpretable query specifications, QueStER offers a scalable alternative to large dense or generative IR models and sets a foundation for future exploration of structured query languages and hybrid backends.
Abstract
Generative Retrieval (GR) differs from the traditional index-then-retrieve pipeline by storing relevance in model parameters and directly generating document identifiers. However, GR often struggles to generalize and is costly to scale. We introduce QUESTER (QUEry SpecificaTion gEnerative Retrieval), which reframes GR as query specification generation - in this work, a simple keyword query handled by BM25 - using a (small) LLM. The policy is trained using reinforcement learning techniques (GRPO). Across in- and out-of-domain evaluations, we show that our model is more effective than BM25, and competitive with neural IR models, while maintaining a good efficiency
