Table of Contents
Fetching ...

OneLive: Dynamically Unified Generative Framework for Live-Streaming Recommendation

Shen Wang, Yusheng Huang, Ruochen Yang, Shuang Wen, Pengbo Xu, Jiangxia Cao, Yueyang Liu, Kuo Cai, Chengcheng Guo, Shiyao Wang, Xinchen Luo, Qiang Luo, Ruiming Tang, Shuang Yang, Zhaojie Liu, Guorui Zhou, Han Li, Kun Gai

TL;DR

OneLive tackles the unique challenges of live-streaming recommendation by introducing a dynamically unified generative framework. It combines a Dynamic Tokenizer for real-time content and behavior fusion, a time-aware gated attention-based decoder-only core with Sequential Multi-Token Prediction and QK Norm, and a Unified Multi-Objective Reinforcement Learning alignment. Offline experiments and large-scale online A/B tests on Kuaishou demonstrate significant improvements over cascaded pipelines and prior generative methods, including strong gains in HR, MRR, and CTR, as well as notable latency and throughput benefits. The work provides a practical end-to-end solution for real-time content evolution, heterogeneous user signals, and multi-objective optimization in live-streaming contexts, and has been deployed to serve hundreds of millions of users daily.

Abstract

Live-streaming recommender system serves as critical infrastructure that bridges the patterns of real-time interactions between users and authors. Similar to traditional industrial recommender systems, live-streaming recommendation also relies on cascade architectures to support large-scale concurrency. Recent advances in generative recommendation unify the multi-stage recommendation process with Transformer-based architectures, offering improved scalability and higher computational efficiency. However, the inherent complexity of live-streaming prevents the direct transfer of these methods to live-streaming scenario, where continuously evolving content, limited lifecycles, strict real-time constraints, and heterogeneous multi-objectives introduce unique challenges that invalidate static tokenization and conventional model framework. To address these issues, we propose OneLive, a dynamically unified generative recommendation framework tailored for live-streaming scenario. OneLive integrates four key components: (i) A Dynamic Tokenizer that continuously encodes evolving real-time live content fused with behavior signal through residual quantization; (ii) A Time-Aware Gated Attention mechanism that explicitly models temporal dynamics for timely decision making; (iii) An efficient decoder-only generative architecture enhanced with Sequential MTP and QK Norm for stable training and accelerated inference; (iv) A Unified Multi-Objective Alignment Framework reinforces policy optimization for personalized preferences.

OneLive: Dynamically Unified Generative Framework for Live-Streaming Recommendation

TL;DR

OneLive tackles the unique challenges of live-streaming recommendation by introducing a dynamically unified generative framework. It combines a Dynamic Tokenizer for real-time content and behavior fusion, a time-aware gated attention-based decoder-only core with Sequential Multi-Token Prediction and QK Norm, and a Unified Multi-Objective Reinforcement Learning alignment. Offline experiments and large-scale online A/B tests on Kuaishou demonstrate significant improvements over cascaded pipelines and prior generative methods, including strong gains in HR, MRR, and CTR, as well as notable latency and throughput benefits. The work provides a practical end-to-end solution for real-time content evolution, heterogeneous user signals, and multi-objective optimization in live-streaming contexts, and has been deployed to serve hundreds of millions of users daily.

Abstract

Live-streaming recommender system serves as critical infrastructure that bridges the patterns of real-time interactions between users and authors. Similar to traditional industrial recommender systems, live-streaming recommendation also relies on cascade architectures to support large-scale concurrency. Recent advances in generative recommendation unify the multi-stage recommendation process with Transformer-based architectures, offering improved scalability and higher computational efficiency. However, the inherent complexity of live-streaming prevents the direct transfer of these methods to live-streaming scenario, where continuously evolving content, limited lifecycles, strict real-time constraints, and heterogeneous multi-objectives introduce unique challenges that invalidate static tokenization and conventional model framework. To address these issues, we propose OneLive, a dynamically unified generative recommendation framework tailored for live-streaming scenario. OneLive integrates four key components: (i) A Dynamic Tokenizer that continuously encodes evolving real-time live content fused with behavior signal through residual quantization; (ii) A Time-Aware Gated Attention mechanism that explicitly models temporal dynamics for timely decision making; (iii) An efficient decoder-only generative architecture enhanced with Sequential MTP and QK Norm for stable training and accelerated inference; (iv) A Unified Multi-Objective Alignment Framework reinforces policy optimization for personalized preferences.
Paper Structure (35 sections, 27 equations, 6 figures, 6 tables)

This paper contains 35 sections, 27 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: (a) Over a live-streaming lifecycle, the author exhibits diverse content types, accompanied by multi-objective user interactions. (b) Clustering 30-second segments across entire live-streamings and report the mean number of clusters to quantify continuous content dynamics.
  • Figure 2: Overall framework of OneLive.
  • Figure 3: The influence of QK Norm on training stability, which prevents the explosion of loss and max QK logits.
  • Figure 4: Loss curves under parameter scaling.
  • Figure 5: Stratified test of online performance.
  • ...and 1 more figures