OneLive: Dynamically Unified Generative Framework for Live-Streaming Recommendation
Shen Wang, Yusheng Huang, Ruochen Yang, Shuang Wen, Pengbo Xu, Jiangxia Cao, Yueyang Liu, Kuo Cai, Chengcheng Guo, Shiyao Wang, Xinchen Luo, Qiang Luo, Ruiming Tang, Shuang Yang, Zhaojie Liu, Guorui Zhou, Han Li, Kun Gai
TL;DR
OneLive tackles the unique challenges of live-streaming recommendation by introducing a dynamically unified generative framework. It combines a Dynamic Tokenizer for real-time content and behavior fusion, a time-aware gated attention-based decoder-only core with Sequential Multi-Token Prediction and QK Norm, and a Unified Multi-Objective Reinforcement Learning alignment. Offline experiments and large-scale online A/B tests on Kuaishou demonstrate significant improvements over cascaded pipelines and prior generative methods, including strong gains in HR, MRR, and CTR, as well as notable latency and throughput benefits. The work provides a practical end-to-end solution for real-time content evolution, heterogeneous user signals, and multi-objective optimization in live-streaming contexts, and has been deployed to serve hundreds of millions of users daily.
Abstract
Live-streaming recommender system serves as critical infrastructure that bridges the patterns of real-time interactions between users and authors. Similar to traditional industrial recommender systems, live-streaming recommendation also relies on cascade architectures to support large-scale concurrency. Recent advances in generative recommendation unify the multi-stage recommendation process with Transformer-based architectures, offering improved scalability and higher computational efficiency. However, the inherent complexity of live-streaming prevents the direct transfer of these methods to live-streaming scenario, where continuously evolving content, limited lifecycles, strict real-time constraints, and heterogeneous multi-objectives introduce unique challenges that invalidate static tokenization and conventional model framework. To address these issues, we propose OneLive, a dynamically unified generative recommendation framework tailored for live-streaming scenario. OneLive integrates four key components: (i) A Dynamic Tokenizer that continuously encodes evolving real-time live content fused with behavior signal through residual quantization; (ii) A Time-Aware Gated Attention mechanism that explicitly models temporal dynamics for timely decision making; (iii) An efficient decoder-only generative architecture enhanced with Sequential MTP and QK Norm for stable training and accelerated inference; (iv) A Unified Multi-Objective Alignment Framework reinforces policy optimization for personalized preferences.
