Combining Query Performance Predictors: A Reproducibility Study

Sourav Saha; Suchana Datta; Dwaipayan Roy; Mandar Mitra; Derek Greene

Combining Query Performance Predictors: A Reproducibility Study

Sourav Saha, Suchana Datta, Dwaipayan Roy, Mandar Mitra, Derek Greene

TL;DR

This work addresses how to improve Query Performance Prediction (QPP) by combining predictors, revisiting the 2009 findings in light of post-retrieval and neural methods, newer evaluation metrics, and larger datasets. It extends the prior study with 10 pre-retrieval and 10 post-retrieval predictors, uses metrics such as $\rho$, $\tau$, $RMSE$, and $sMARE$, and evaluates on TREC Robust, CW09B, and MS MARCO. The results show that predictor fusion offers limited, context-dependent gains: beneficial mainly when predictor scores are not highly correlated, while strong positive or negative correlations reduce or negate benefits; neural post-retrieval predictors often dominate, with fusion benefits shrinking on larger collections. The findings provide guidance on when fusion is advantageous and contribute reproducible artifacts and extended datasets for the QPP community.

Abstract

A large number of approaches to Query Performance Prediction (QPP) have been proposed over the last two decades. As early as 2009, Hauff et al. [28] explored whether different QPP methods may be combined to improve prediction quality. Since then, significant research has been done both on QPP approaches, as well as their evaluation. This study revisits Hauff et al.s work to assess the reproducibility of their findings in the light of new prediction methods, evaluation metrics, and datasets. We expand the scope of the earlier investigation by: (i) considering post-retrieval methods, including supervised neural techniques (only pre-retrieval techniques were studied in [28]); (ii) using sMARE for evaluation, in addition to the traditional correlation coefficients and RMSE; and (iii) experimenting with additional datasets (Clueweb09B and TREC DL). Our results largely support previous claims, but we also present several interesting findings. We interpret these findings by taking a more nuanced look at the correlation between QPP methods, examining whether they capture diverse information or rely on overlapping factors.

Combining Query Performance Predictors: A Reproducibility Study

TL;DR

Abstract

Combining Query Performance Predictors: A Reproducibility Study

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)