LLM-Alignment Live-Streaming Recommendation
Yueyang Liu, Jiangxia Cao, Shen Wang, Shuang Wen, Xiang Chen, Xiangyu Wu, Shuang Yang, Zhaojie Liu, Kun Gai, Guorui Zhou
TL;DR
This work tackles the variability and uncertainty of live-streaming recommendations by aligning multi-modal LLM knowledge with RecSys signals through a gated embedding framework (LARM). It introduces a 30-second live-streaming LLM tuning pipeline, a two-tower gated fusion to fuse author IDs and LLM embeddings, and a hierarchical semantic-code quantification to compress real-time semantics for user histories. Offline and online experiments in an industrial setting demonstrate consistent retrieval and ranking gains, improved embedding alignment, and meaningful semantic clustering of content, with particular benefits for long-tail authors. The approach offers a practical pathway to deploy multi-modal, real-time semantic understanding within RecSys, enabling more accurate and contextually relevant live-streaming recommendations at scale.
Abstract
In recent years, integrated short-video and live-streaming platforms have gained massive global adoption, offering dynamic content creation and consumption. Unlike pre-recorded short videos, live-streaming enables real-time interaction between authors and users, fostering deeper engagement. However, this dynamic nature introduces a critical challenge for recommendation systems (RecSys): the same live-streaming vastly different experiences depending on when a user watching. To optimize recommendations, a RecSys must accurately interpret the real-time semantics of live content and align them with user preferences.
