Whole Page Unbiased Learning to Rank

Haitao Mao; Lixin Zou; Yujia Zheng; Jiliang Tang; Xiaokai Chu; Jiashu Zhao; Qian Wang; Dawei Yin

Whole Page Unbiased Learning to Rank

Haitao Mao, Lixin Zou, Yujia Zheng, Jiliang Tang, Xiaokai Chu, Jiashu Zhao, Qian Wang, Dawei Yin

TL;DR

This paper introduces Whole-page Unbiased Learning to Rank (WP-ULTR) and the BAL algorithm to mitigate biases from all SERP features, not just ranking position. BAL automatically learns a user behavior model via causal discovery and performs bias mitigation through confounding-removal with reweighting and by constraining learning to rely on query-document relevance, using a BERT-based backbone. Experiments on the Baidu-ULTR dataset show BAL outperforms state-of-the-art PB-ULTR baselines across DCG and ERR metrics, especially for high-frequency queries, and ablation studies confirm the necessity of causal-discovery-driven modeling and comprehensive bias mitigation. The work advances practical unbiased learning for modern search systems by enabling automatic discovery of complex, multi-feature biases and providing a scalable, explainable framework with real-world impact.

Abstract

The page presentation biases in the information retrieval system, especially on the click behavior, is a well-known challenge that hinders improving ranking models' performance with implicit user feedback. Unbiased Learning to Rank~(ULTR) algorithms are then proposed to learn an unbiased ranking model with biased click data. However, most existing algorithms are specifically designed to mitigate position-related bias, e.g., trust bias, without considering biases induced by other features in search result page presentation(SERP), e.g. attractive bias induced by the multimedia. Unfortunately, those biases widely exist in industrial systems and may lead to an unsatisfactory search experience. Therefore, we introduce a new problem, i.e., whole-page Unbiased Learning to Rank(WP-ULTR), aiming to handle biases induced by whole-page SERP features simultaneously. It presents tremendous challenges: (1) a suitable user behavior model (user behavior hypothesis) can be hard to find; and (2) complex biases cannot be handled by existing algorithms. To address the above challenges, we propose a Bias Agnostic whole-page unbiased Learning to rank algorithm, named BAL, to automatically find the user behavior model with causal discovery and mitigate the biases induced by multiple SERP features with no specific design. Experimental results on a real-world dataset verify the effectiveness of the BAL.

Whole Page Unbiased Learning to Rank

TL;DR

Abstract

Paper Structure (37 sections, 10 equations, 5 figures, 3 tables)

This paper contains 37 sections, 10 equations, 5 figures, 3 tables.

Introduction
Preliminary
Learning to Rank
Whole-Page Unbiased Learning to Rank
Bias
Causal graph
The difference between PB-ULTR and WP-ULTR
Bias Agnostic Learning Algorithm
User Behavior Model Design
SEPP feature preprocessing
Causal discovery
Influence score estimation on the SEPP feature
Unbiased learning
Ranking model backbone
Unbiased user behavior model modification
...and 22 more sections

Figures (5)

Figure 1: An overall procedure illustration of the BAL algorithm. The biased observation inputs include the query-document relevance score $\hat{\mathbf{r}}$, the click $\mathbf{c}$, and SEPP features $\mathbf{x}$. BAL is then applied in two steps: (1) User behavior model design step learns a causal graph with causal discovery; (2) Unbiased learning steps then mitigate biases found in the causal graph towards an unbiased ranking model.
Figure 2: Two representative cases on how the relevance score $\hat{r}$ reveals the confounding bias. The unbiased learning algorithm first removes the confounding bias then learns the click effect on the relevance score. $\hat{r}$, $r$, $c$, $p$, and $m$ correspond to the relevance score, true relevance, click, position, and multimedia types, respectively.
Figure 3: An illustration of how the causal graph changes during the training procedure on Baidu-ULTR. $r$, $c$, $p$, $m$, $h$, and $mh$ correspond to relevance score, click, position, multimedia types, SEPP height, and maximum SEPP height, respectively.
Figure 4: Average positions after re-ranking of documents at the top-10 original position by different ULTR methods.
Figure 5: Performance comparison on the naive algorithm, BAL and its variants without user behavior model design.

Whole Page Unbiased Learning to Rank

TL;DR

Abstract

Whole Page Unbiased Learning to Rank

Authors

TL;DR

Abstract

Table of Contents

Figures (5)