Predicting IR Personalization Performance using Pre-retrieval Query Predictors
Eduardo Vicente-López, Luis M. de Campos, Juan M. Fernández-Luna, Juan F. Huete
TL;DR
The paper tackles the problem of predicting when IR personalization will improve or degrade performance by leveraging a broad suite of pre-retrieval predictors, including user-profile information. It systematically extends predictors to incorporate profiles (yielding 37 predictors) and analyzes their correlations with the personalization delta $diffPerso$, finding no single robust predictor. To boost predictive power, the authors employ per-profile Random Forest classification and regression, achieving about one-third of the ideal improvement by safely disabling personalization for harmful queries, with ASPIRE-based results (≈39% ideal gain) outperforming the user study. A feature-reduction experiment shows that using the top 10 predictors offers nearly the same gains as using all 37 when latency is critical. Overall, this work provides a promising framework for pre-retrieval personalization decisions and highlights directions for richer profile-aware predictors and future enhancements.
Abstract
Personalization generally improves the performance of queries but in a few cases it may also harms it. If we are able to predict and therefore to disable personalization for those situations, the overall performance will be higher and users will be more satisfied with personalized systems. We use some state-of-the-art pre-retrieval query performance predictors and propose some others including the user profile information for the previous purpose. We study the correlations among these predictors and the difference between the personalized and the original queries. We also use classification and regression techniques to improve the results and finally reach a bit more than one third of the maximum ideal performance. We think this is a good starting point within this research line, which certainly needs more effort and improvements.
