Table of Contents
Fetching ...

GenQREnsemble: Zero-Shot LLM Ensemble Prompting for Generative Query Reformulation

Kaustubh Dhole, Eugene Agichtein

TL;DR

This work tackles vocabulary mismatch and ambiguous intent in information retrieval by strengthening zero-shot query reformulation through an ensemble of paraphrastic instructions. The authors introduce GenQREnsemble, which generates multiple keyword sets by paraphrasing the base QR instruction, and GenQREnsembleRF, which extends this with post-retrieval relevance feedback. Across four IR benchmarks, GenQREnsemble yields substantial pre-retrieval gains (up to $18\%$ in $nDCG@10$ and $24\%$ in $MAP$) over the previous zero-shot state-of-the-art, while GenQREnsembleRF provides additional improvements in post-retrieval scenarios, achieving up to $5\%$ in $MRR$ and $9\%$ in $nDCG@10$ on MSMarco with feedback. The results support the efficacy and robustness of instruction-ensemble approaches for QR, with practical implications for improving search effectiveness, albeit with potential latency trade-offs that can be mitigated with batch inference and broader applicability to related query-centric tasks.

Abstract

Query Reformulation(QR) is a set of techniques used to transform a user's original search query to a text that better aligns with the user's intent and improves their search experience. Recently, zero-shot QR has been shown to be a promising approach due to its ability to exploit knowledge inherent in large language models. By taking inspiration from the success of ensemble prompting strategies which have benefited many tasks, we investigate if they can help improve query reformulation. In this context, we propose an ensemble based prompting technique, GenQREnsemble which leverages paraphrases of a zero-shot instruction to generate multiple sets of keywords ultimately improving retrieval performance. We further introduce its post-retrieval variant, GenQREnsembleRF to incorporate pseudo relevant feedback. On evaluations over four IR benchmarks, we find that GenQREnsemble generates better reformulations with relative nDCG@10 improvements up to 18% and MAP improvements upto 24% over the previous zero-shot state-of-art. On the MSMarco Passage Ranking task, GenQREnsembleRF shows relative gains of 5% MRR using pseudo-relevance feedback, and 9% nDCG@10 using relevant feedback documents.

GenQREnsemble: Zero-Shot LLM Ensemble Prompting for Generative Query Reformulation

TL;DR

This work tackles vocabulary mismatch and ambiguous intent in information retrieval by strengthening zero-shot query reformulation through an ensemble of paraphrastic instructions. The authors introduce GenQREnsemble, which generates multiple keyword sets by paraphrasing the base QR instruction, and GenQREnsembleRF, which extends this with post-retrieval relevance feedback. Across four IR benchmarks, GenQREnsemble yields substantial pre-retrieval gains (up to in and in ) over the previous zero-shot state-of-the-art, while GenQREnsembleRF provides additional improvements in post-retrieval scenarios, achieving up to in and in on MSMarco with feedback. The results support the efficacy and robustness of instruction-ensemble approaches for QR, with practical implications for improving search effectiveness, albeit with potential latency trade-offs that can be mitigated with batch inference and broader applicability to related query-centric tasks.

Abstract

Query Reformulation(QR) is a set of techniques used to transform a user's original search query to a text that better aligns with the user's intent and improves their search experience. Recently, zero-shot QR has been shown to be a promising approach due to its ability to exploit knowledge inherent in large language models. By taking inspiration from the success of ensemble prompting strategies which have benefited many tasks, we investigate if they can help improve query reformulation. In this context, we propose an ensemble based prompting technique, GenQREnsemble which leverages paraphrases of a zero-shot instruction to generate multiple sets of keywords ultimately improving retrieval performance. We further introduce its post-retrieval variant, GenQREnsembleRF to incorporate pseudo relevant feedback. On evaluations over four IR benchmarks, we find that GenQREnsemble generates better reformulations with relative nDCG@10 improvements up to 18% and MAP improvements upto 24% over the previous zero-shot state-of-art. On the MSMarco Passage Ranking task, GenQREnsembleRF shows relative gains of 5% MRR using pseudo-relevance feedback, and 9% nDCG@10 using relevant feedback documents.
Paper Structure (9 sections, 5 figures, 2 tables)

This paper contains 9 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Keywords generated for the query ("do goldfish grow") differ drastically when generated from two paraphrastic instructions prompted to flan-t5-xxlflant5.
  • Figure 2: The complete flow and algorithm shown on the top right.
  • Figure 3: Reformulation instructions generated ($N$=10).
  • Figure 4: nDCG@10 Scores of GenQREnsemble and FlanQR relative to BM25
  • Figure 5: Effect of feedback documents under sparse (BM25) and neural (MonoT5) rankers