Pairwise Ranking Loss for Multi-Task Learning in Recommender Systems

Furkan Durmus; Hasan Saribas; Said Aldemir; Junyan Yang; Hakan Cevikalp

Pairwise Ranking Loss for Multi-Task Learning in Recommender Systems

Furkan Durmus, Hasan Saribas, Said Aldemir, Junyan Yang, Hakan Cevikalp

TL;DR

The paper addresses the challenge of jointly predicting CTR and CVR in recommender systems where CVR can only occur after a click, and CTR signals are often noisy. It introduces a task-specific pairwise ranking loss, PWiseR, added to Binary Cross-Entropy to emphasize conversion-bearing exposures, with two pairwise terms governed by margins that push CVR-positive samples above CTR-only and non-click samples. The total objective is Loss = BCE + λ PWiseR, and the method is shown to be compatible with various MTL architectures and STL baselines. Experiments on four Alibaba public datasets and a proprietary industrial dataset demonstrate that PWiseR improves AUC in most settings, highlighting its effectiveness in robust ranking and revenue-oriented optimization for online advertising. The work provides a practical, model-agnostic approach to reduce noise from CTR data while aligning learning with conversion-driven objectives, potentially improving eCPM-driven performance in real systems. $pCTCVR = pCTR \times pCVR$ illustrates the integrated measure of click and conversion likelihood that the approach seeks to optimize.

Abstract

Multi-Task Learning (MTL) plays a crucial role in real-world advertising applications such as recommender systems, aiming to achieve robust representations while minimizing resource consumption. MTL endeavors to simultaneously optimize multiple tasks to construct a unified model serving diverse objectives. In online advertising systems, tasks like Click-Through Rate (CTR) and Conversion Rate (CVR) are often treated as MTL problems concurrently. However, it has been overlooked that a conversion ($y_{cvr}=1$) necessitates a preceding click ($y_{ctr}=1$). In other words, while certain CTR tasks are associated with corresponding conversions, others lack such associations. Moreover, the likelihood of noise is significantly higher in CTR tasks where conversions do not occur compared to those where they do, and existing methods lack the ability to differentiate between these two scenarios. In this study, exposure labels corresponding to conversions are regarded as definitive indicators, and a novel task-specific loss is introduced by calculating a \textbf{p}air\textbf{wise} \textbf{r}anking (PWiseR) loss between model predictions, manifesting as pairwise ranking loss, to encourage the model to rely more on them. To demonstrate the effect of the proposed loss function, experiments were conducted on different MTL and Single-Task Learning (STL) models using four distinct public MTL datasets, namely Alibaba FR, NL, US, and CCP, along with a proprietary industrial dataset. The results indicate that our proposed loss function outperforms the BCE loss function in most cases in terms of the AUC metric.

Pairwise Ranking Loss for Multi-Task Learning in Recommender Systems

TL;DR

illustrates the integrated measure of click and conversion likelihood that the approach seeks to optimize.

Abstract

) necessitates a preceding click (

). In other words, while certain CTR tasks are associated with corresponding conversions, others lack such associations. Moreover, the likelihood of noise is significantly higher in CTR tasks where conversions do not occur compared to those where they do, and existing methods lack the ability to differentiate between these two scenarios. In this study, exposure labels corresponding to conversions are regarded as definitive indicators, and a novel task-specific loss is introduced by calculating a \textbf{p}air\textbf{wise} \textbf{r}anking (PWiseR) loss between model predictions, manifesting as pairwise ranking loss, to encourage the model to rely more on them. To demonstrate the effect of the proposed loss function, experiments were conducted on different MTL and Single-Task Learning (STL) models using four distinct public MTL datasets, namely Alibaba FR, NL, US, and CCP, along with a proprietary industrial dataset. The results indicate that our proposed loss function outperforms the BCE loss function in most cases in terms of the AUC metric.

Paper Structure (8 sections, 2 equations, 1 figure, 6 tables)

This paper contains 8 sections, 2 equations, 1 figure, 6 tables.

Introduction
Related Works
Method
Problem Definition
Proposed Loss Function
Experiments
Experimental setup
CONCLUSION

Figures (1)

Figure 1: Demonstration of the proposed loss function for the training phase together with the overall MTL architecture.

Pairwise Ranking Loss for Multi-Task Learning in Recommender Systems

TL;DR

Abstract

Pairwise Ranking Loss for Multi-Task Learning in Recommender Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (1)