Unbiased Learning to Rank with Query-Level Click Propensity Estimation: Beyond Pointwise Observation and Relevance
Lulu Yu, Keping Bi, Jiafeng Guo, Shihao Liu, Dawei Yin, Xueqi Cheng
TL;DR
This work addresses biases in unbiased learning-to-rank (ULTR) beyond the traditional position-examination assumption by introducing relevance saturation bias, where queries with more relevant results are more likely to attract clicks, potentially leaving relevant results unclicked. It proposes DualIPW, a dual inverse propensity weighting framework that combines query-level and position-level propensities, supported by a provable unbiased learning objective and a learned query-level propensity estimator based on log-ratio features and an LSTM. The approach yields significant improvements over strong ULTR baselines on real-world Baidu-ULTR data, with ablations confirming that both query- and position-level components are essential. The work advances practical ULTR by better accounting for real user behavior and logging biases, enabling more reliable ranking in production settings; code and data are publicly available for reproduction.
Abstract
Most existing unbiased learning-to-rank (ULTR) approaches are based on the user examination hypothesis, which assumes that users will click a result only if it is both relevant and observed (typically modeled by position). However, in real-world scenarios, users often click only one or two results after examining multiple relevant options, due to limited patience or because their information needs have already been satisfied. Motivated by this, we propose a query-level click propensity model to capture the probability that users will click on different result lists, allowing for non-zero probabilities that users may not click on an observed relevant result. We hypothesize that this propensity increases when more potentially relevant results are present, and refer to this user behavior as relevance saturation bias. Our method introduces a Dual Inverse Propensity Weighting (DualIPW) mechanism -- combining query-level and position-level IPW -- to address both relevance saturation and position bias. Through theoretical derivation, we prove that DualIPW can learn an unbiased ranking model. Experiments on the real-world Baidu-ULTR dataset demonstrate that our approach significantly outperforms state-of-the-art ULTR baselines. The code and dataset information can be found at https://github.com/Trustworthy-Information-Access/DualIPW.
