Table of Contents
Fetching ...

Combining Query Performance Predictors: A Reproducibility Study

Sourav Saha, Suchana Datta, Dwaipayan Roy, Mandar Mitra, Derek Greene

TL;DR

This work addresses how to improve Query Performance Prediction (QPP) by combining predictors, revisiting the 2009 findings in light of post-retrieval and neural methods, newer evaluation metrics, and larger datasets. It extends the prior study with 10 pre-retrieval and 10 post-retrieval predictors, uses metrics such as $\rho$, $\tau$, $RMSE$, and $sMARE$, and evaluates on TREC Robust, CW09B, and MS MARCO. The results show that predictor fusion offers limited, context-dependent gains: beneficial mainly when predictor scores are not highly correlated, while strong positive or negative correlations reduce or negate benefits; neural post-retrieval predictors often dominate, with fusion benefits shrinking on larger collections. The findings provide guidance on when fusion is advantageous and contribute reproducible artifacts and extended datasets for the QPP community.

Abstract

A large number of approaches to Query Performance Prediction (QPP) have been proposed over the last two decades. As early as 2009, Hauff et al. [28] explored whether different QPP methods may be combined to improve prediction quality. Since then, significant research has been done both on QPP approaches, as well as their evaluation. This study revisits Hauff et al.s work to assess the reproducibility of their findings in the light of new prediction methods, evaluation metrics, and datasets. We expand the scope of the earlier investigation by: (i) considering post-retrieval methods, including supervised neural techniques (only pre-retrieval techniques were studied in [28]); (ii) using sMARE for evaluation, in addition to the traditional correlation coefficients and RMSE; and (iii) experimenting with additional datasets (Clueweb09B and TREC DL). Our results largely support previous claims, but we also present several interesting findings. We interpret these findings by taking a more nuanced look at the correlation between QPP methods, examining whether they capture diverse information or rely on overlapping factors.

Combining Query Performance Predictors: A Reproducibility Study

TL;DR

This work addresses how to improve Query Performance Prediction (QPP) by combining predictors, revisiting the 2009 findings in light of post-retrieval and neural methods, newer evaluation metrics, and larger datasets. It extends the prior study with 10 pre-retrieval and 10 post-retrieval predictors, uses metrics such as , , , and , and evaluates on TREC Robust, CW09B, and MS MARCO. The results show that predictor fusion offers limited, context-dependent gains: beneficial mainly when predictor scores are not highly correlated, while strong positive or negative correlations reduce or negate benefits; neural post-retrieval predictors often dominate, with fusion benefits shrinking on larger collections. The findings provide guidance on when fusion is advantageous and contribute reproducible artifacts and extended datasets for the QPP community.

Abstract

A large number of approaches to Query Performance Prediction (QPP) have been proposed over the last two decades. As early as 2009, Hauff et al. [28] explored whether different QPP methods may be combined to improve prediction quality. Since then, significant research has been done both on QPP approaches, as well as their evaluation. This study revisits Hauff et al.s work to assess the reproducibility of their findings in the light of new prediction methods, evaluation metrics, and datasets. We expand the scope of the earlier investigation by: (i) considering post-retrieval methods, including supervised neural techniques (only pre-retrieval techniques were studied in [28]); (ii) using sMARE for evaluation, in addition to the traditional correlation coefficients and RMSE; and (iii) experimenting with additional datasets (Clueweb09B and TREC DL). Our results largely support previous claims, but we also present several interesting findings. We interpret these findings by taking a more nuanced look at the correlation between QPP methods, examining whether they capture diverse information or rely on overlapping factors.

Paper Structure

This paper contains 12 sections, 1 figure, 4 tables.

Figures (1)

  • Figure 1: Heatmap visualizing the rank correlation among QPP methods based on their individual QPP scores. Colour intensity represents the correlation values between the corresponding techniques measured with $\rho$; lighter intensity represents higher correlation. The upper row depicts the correlation for pre-retrieval methods and the bottom row shows the post-retrieval QPP correlations for all the collections.