GenQREnsemble: Zero-Shot LLM Ensemble Prompting for Generative Query Reformulation
Kaustubh Dhole, Eugene Agichtein
TL;DR
This work tackles vocabulary mismatch and ambiguous intent in information retrieval by strengthening zero-shot query reformulation through an ensemble of paraphrastic instructions. The authors introduce GenQREnsemble, which generates multiple keyword sets by paraphrasing the base QR instruction, and GenQREnsembleRF, which extends this with post-retrieval relevance feedback. Across four IR benchmarks, GenQREnsemble yields substantial pre-retrieval gains (up to $18\%$ in $nDCG@10$ and $24\%$ in $MAP$) over the previous zero-shot state-of-the-art, while GenQREnsembleRF provides additional improvements in post-retrieval scenarios, achieving up to $5\%$ in $MRR$ and $9\%$ in $nDCG@10$ on MSMarco with feedback. The results support the efficacy and robustness of instruction-ensemble approaches for QR, with practical implications for improving search effectiveness, albeit with potential latency trade-offs that can be mitigated with batch inference and broader applicability to related query-centric tasks.
Abstract
Query Reformulation(QR) is a set of techniques used to transform a user's original search query to a text that better aligns with the user's intent and improves their search experience. Recently, zero-shot QR has been shown to be a promising approach due to its ability to exploit knowledge inherent in large language models. By taking inspiration from the success of ensemble prompting strategies which have benefited many tasks, we investigate if they can help improve query reformulation. In this context, we propose an ensemble based prompting technique, GenQREnsemble which leverages paraphrases of a zero-shot instruction to generate multiple sets of keywords ultimately improving retrieval performance. We further introduce its post-retrieval variant, GenQREnsembleRF to incorporate pseudo relevant feedback. On evaluations over four IR benchmarks, we find that GenQREnsemble generates better reformulations with relative nDCG@10 improvements up to 18% and MAP improvements upto 24% over the previous zero-shot state-of-art. On the MSMarco Passage Ranking task, GenQREnsembleRF shows relative gains of 5% MRR using pseudo-relevance feedback, and 9% nDCG@10 using relevant feedback documents.
