Mitigate Position Bias with Coupled Ranking Bias on CTR Prediction
Yao Zhao, Zhining Liu, Tianchi Cai, Haipeng Zhang, Chenyi Zhuang, Jinjie Gu
TL;DR
The paper tackles position bias in CTR prediction under the jointly occurring ranking bias, showing that ranking bias can cause an overestimation of the position effect, i.e., the Position Gradient $\text{Position Gradient} := \mathbb{E}_{P \sim (i \perp k | u)} \frac{\partial [y \mid u, i, k]}{\partial k}$. It proposes gradient interpolation (GI), a two-model fusion where CTR is predicted as $(1-\epsilon)p_a + \epsilon p_u$ and the position gradient scales as $(1-\epsilon)g$, with an analytically derived optimal fusion weight $\epsilon$ given by $\epsilon = \frac{\sum_i (p^g_i - p^u_i)(p^a_i - p^u_i)}{\sum_i (p^a_i - p^u_i)^2}$. A randomization-based implementation makes GI practical for training and serving, and the method can automatically adapt $\epsilon$ using a small amount of random ranking samples. Extensive offline and online evaluations on synthetic and industrial datasets show that GI outperforms baselines (e.g., ST-PSF, PAL, DPIN) in AUC and CTR, with offline AUC improvements (e.g., synthetic: 0.724 → 0.743; industrial: 0.697 → 0.707) and online gains (shop +3.43%, goods +2.69%). The work demonstrates a principled approach to mitigating biases in CTR models under coupled position and ranking biases and suggests directions to derive the fusion weight without random rankings in future work.
Abstract
Position bias, i.e., users' preference of an item is affected by its placing position, is well studied in the recommender system literature. However, most existing methods ignore the widely coupled ranking bias, which is also related to the placing position of the item. Using both synthetic and industrial datasets, we first show how this widely coexisted ranking bias deteriorates the performance of the existing position bias estimation methods. To mitigate the position bias with the presence of the ranking bias, we propose a novel position bias estimation method, namely gradient interpolation, which fuses two estimation methods using a fusing weight. We further propose an adaptive method to automatically determine the optimal fusing weight. Extensive experiments on both synthetic and industrial datasets demonstrate the superior performance of the proposed methods.
