Can we predict QPP? An approach based on multivariate outliers
Adrian-Gabriel Chifu, Sébastien Déjean, Moncef Garouani, Josiane Mothe, Diégo Ortiz, Md Zia Ullah
TL;DR
The paper tackles the limited accuracy of Query Performance Prediction (QPP) by examining predictability itself. It applies multivariate outlier detection based on Transformed Rank Correlations (TRC) to identify queries for which QPP tends to fail, using a multivariate distance approach with a 0.95 F-distribution threshold and multiple predictors. The study demonstrates that hard-to-predict queries exist and that removing them yields higher correlations between predicted and actual performance across datasets and QPPs, indicating robustness of the approach beyond a single predictor. This work proposes a new research direction: measuring and potentially abstaining from QPP when uncertainty is high, and extending the analysis to more datasets and feature-combination models.
Abstract
Query performance prediction (QPP) aims to forecast the effectiveness of a search engine across a range of queries and documents. While state-of-the-art predictors offer a certain level of precision, their accuracy is not flawless. Prior research has recognized the challenges inherent in QPP but often lacks a thorough qualitative analysis. In this paper, we delve into QPP by examining the factors that influence the predictability of query performance accuracy. We propose the working hypothesis that while some queries are readily predictable, others present significant challenges. By focusing on outliers, we aim to identify the queries that are particularly challenging to predict. To this end, we employ multivariate outlier detection method. Our results demonstrate the effectiveness of this approach in identifying queries on which QPP do not perform well, yielding less reliable predictions. Moreover, we provide evidence that excluding these hard-to-predict queries from the analysis significantly enhances the overall accuracy of QPP.
