Recall-Augmented Ranking: Enhancing Click-Through Rate Prediction Accuracy with Cross-Stage Data
Junjie Huang, Guohao Cai, Jieming Zhu, Zhenhua Dong, Ruiming Tang, Weinan Zhang, Yong Yu
TL;DR
The paper tackles the limitation that CTR prediction models often rely on scarce and homogeneous user histories. It proposes Recall-Augmented Ranking (RAR), a cross-stage framework with Cross-Stage User/Item Selection and Co-Interaction modules to enrich user representations using look-alike users (u2u) and recall items (i2i), enabling cross-instance modeling and compatibility with existing CTR models. The approach combines a fast SimHash-based selection method with a set-to-set co-interaction mechanism and a joint objective that balances CTR accuracy with recall-item supervision, achieving up to 4.7% absolute AUC gains across multiple datasets. This work demonstrates that cross-stage data can robustly enhance CTR performance and provides a practical, plug-in framework for practitioners seeking improved cross-instance user profiling in large-scale recommender systems.
Abstract
Click-through rate (CTR) prediction plays an indispensable role in online platforms. Numerous models have been proposed to capture users' shifting preferences by leveraging user behavior sequences. However, these historical sequences often suffer from severe homogeneity and scarcity compared to the extensive item pool. Relying solely on such sequences for user representations is inherently restrictive, as user interests extend beyond the scope of items they have previously engaged with. To address this challenge, we propose a data-driven approach to enrich user representations. We recognize user profiling and recall items as two ideal data sources within the cross-stage framework, encompassing the u2u (user-to-user) and i2i (item-to-item) aspects respectively. In this paper, we propose a novel architecture named Recall-Augmented Ranking (RAR). RAR consists of two key sub-modules, which synergistically gather information from a vast pool of look-alike users and recall items, resulting in enriched user representations. Notably, RAR is orthogonal to many existing CTR models, allowing for consistent performance improvements in a plug-and-play manner. Extensive experiments are conducted, which verify the efficacy and compatibility of RAR against the SOTA methods.
