Revisiting Query Variants: The Advantage of Retrieval Over Generation of Query Variants for Effective QPP
Fangzheng Tian, Debasis Ganguly, Craig Macdonald
TL;DR
The paper tackles the problem of predicting query performance for neural IR models by replacing or augmenting generated query variants with retrieved QVs from a large training set. It introduces a two-step neighbourhood expansion (1-hop and 2-hop) to retrieve high-recall QVs and uses a re-ranked, similarity-weighted aggregation to yield improved QPP signals. Empirically, retrieved QVs consistently outperform generated QVs, with 2-hop QVs offering additional gains, notably achieving up to about 20% improvements on neural ranking models like MonoT5. The approach demonstrates practical value for QPP in real-world systems and suggests future work integrating retrieval-based QVs with LLM-driven QV generation to further reduce topical drift. Overall, the work provides a robust, retrieval-centered framework to enhance QPP for neural IR, with clear guidance on hyperparameter sensitivity and evaluation on standard benchmarks.
Abstract
Leveraging query variants (QVs), i.e., queries with potentially similar information needs to the target query, has been shown to improve the effectiveness of query performance prediction (QPP) approaches. Existing QV-based QPP methods generate QVs facilitated by either query expansion or non-contextual embeddings, which may introduce topical drifts and hallucinations. In this paper, we propose a method that retrieves QVs from a training set (e.g., MS MARCO) for a given target query of QPP. To achieve a high recall in retrieving queries with the most similar information needs as the target query from a training set, we extend the directly retrieved QVs (1-hop QVs) by a second retrieval using their denoted relevant documents (which yields 2-hop QVs). Our experiments, conducted on TREC DL'19 and DL'20, show that the QPP methods with QVs retrieved by our method outperform the best-performing existing generated-QV-based QPP approaches by as much as around 20\%, on neural ranking models like MonoT5.
