Moment&Cross: Next-Generation Real-Time Cross-Domain CTR Prediction for Live-Streaming Recommendation at Kuaishou
Jiangxia Cao, Shen Wang, Yue Li, Shenghui Wang, Jian Tang, Shiyao Wang, Shuang Yang, Zhaojie Liu, Guorui Zhou
TL;DR
Moment&Cross tackles real-time cross-domain CTR prediction for live-streaming on Kuaishou by addressing temporal content shifts, feedback delays, and data sparsity through a dual mechanism: Moment enables 30-second real-time data streaming with first-only label-mask learning, while Cross transfers rich short-video interests via General and Exact Search Units and contrastive alignment. The framework combines fast real-time feedback with cross-domain history to improve predictions, using multi-task ranking and a tailored loss design ($\mathcal{L}_{fast}$, $\mathcal{L}_{slow}$, $\mathcal{L}_{moment}$) and a composite Ranking_Score, together with cross-domain contrastive objectives. Offline results show consistent gains in AUC/GAUC and ablations confirm the value of each component, and online A/B tests demonstrate improvements in Click and Watch Time, with notable benefits for low-gift users, validating real-time adaptation and cross-domain transfer at scale. The approach is deployed to serve 400 million users, illustrating practical impact in live-streaming recommendations where identifying high-light moments and surfacing related live-streams are critical.
Abstract
Kuaishou, is one of the largest short-video and live-streaming platform, compared with short-video recommendations, live-streaming recommendation is more complex because of: (1) temporarily-alive to distribution, (2) user may watch for a long time with feedback delay, (3) content is unpredictable and changes over time. Actually, even if a user is interested in the live-streaming author, it still may be an negative watching (e.g., short-view < 3s) since the real-time content is not attractive enough. Therefore, for live-streaming recommendation, there exists a challenging task: how do we recommend the live-streaming at right moment for users? Additionally, our platform's major exposure content is short short-video, and the amount of exposed short-video is 9x more than exposed live-streaming. Thus users will leave more behaviors on short-videos, which leads to a serious data imbalance problem making the live-streaming data could not fully reflect user interests. In such case, there raises another challenging task: how do we utilize users' short-video behaviors to make live-streaming recommendation better?
