Understanding the Ranking Loss for Recommendation with Sparse User Feedback
Zhutian Lin, Junwei Pan, Shangyu Zhang, Ximei Wang, Xi Xiao, Shudong Huang, Lei Xiao, Jie Jiang
TL;DR
This work addresses the challenge that binary cross entropy (BCE) optimization in CTR prediction suffers from gradient vanishing for negative samples when positives are scarce. By introducing an auxiliary ranking loss (as in Combined-Pair) the model obtains larger gradients on negative examples, which improves both the classification objective and ranking performance. The authors present theoretical gradient analyses, offline experiments on synthetic-sparse data, and online deployments in Tencent that show consistent improvements in BCE loss, AUC, and GMV across multiple scenarios, including new ads. They further explore stability and compatibility, and extend the approach with variants like Combined-Contrastive, Focal Loss, and other ranking-loss combinations, demonstrating broad practical benefits for online advertising systems.
Abstract
Click-through rate (CTR) prediction is a crucial area of research in online advertising. While binary cross entropy (BCE) has been widely used as the optimization objective for treating CTR prediction as a binary classification problem, recent advancements have shown that combining BCE loss with an auxiliary ranking loss can significantly improve performance. However, the full effectiveness of this combination loss is not yet fully understood. In this paper, we uncover a new challenge associated with the BCE loss in scenarios where positive feedback is sparse: the issue of gradient vanishing for negative samples. We introduce a novel perspective on the effectiveness of the auxiliary ranking loss in CTR prediction: it generates larger gradients on negative samples, thereby mitigating the optimization difficulties when using the BCE loss only and resulting in improved classification ability. To validate our perspective, we conduct theoretical analysis and extensive empirical evaluations on public datasets. Additionally, we successfully integrate the ranking loss into Tencent's online advertising system, achieving notable lifts of 0.70% and 1.26% in Gross Merchandise Value (GMV) for two main scenarios. The code is openly accessible at: https://github.com/SkylerLinn/Understanding-the-Ranking-Loss.
