Table of Contents
Fetching ...

LLM-Alignment Live-Streaming Recommendation

Yueyang Liu, Jiangxia Cao, Shen Wang, Shuang Wen, Xiang Chen, Xiangyu Wu, Shuang Yang, Zhaojie Liu, Kun Gai, Guorui Zhou

TL;DR

This work tackles the variability and uncertainty of live-streaming recommendations by aligning multi-modal LLM knowledge with RecSys signals through a gated embedding framework (LARM). It introduces a 30-second live-streaming LLM tuning pipeline, a two-tower gated fusion to fuse author IDs and LLM embeddings, and a hierarchical semantic-code quantification to compress real-time semantics for user histories. Offline and online experiments in an industrial setting demonstrate consistent retrieval and ranking gains, improved embedding alignment, and meaningful semantic clustering of content, with particular benefits for long-tail authors. The approach offers a practical pathway to deploy multi-modal, real-time semantic understanding within RecSys, enabling more accurate and contextually relevant live-streaming recommendations at scale.

Abstract

In recent years, integrated short-video and live-streaming platforms have gained massive global adoption, offering dynamic content creation and consumption. Unlike pre-recorded short videos, live-streaming enables real-time interaction between authors and users, fostering deeper engagement. However, this dynamic nature introduces a critical challenge for recommendation systems (RecSys): the same live-streaming vastly different experiences depending on when a user watching. To optimize recommendations, a RecSys must accurately interpret the real-time semantics of live content and align them with user preferences.

LLM-Alignment Live-Streaming Recommendation

TL;DR

This work tackles the variability and uncertainty of live-streaming recommendations by aligning multi-modal LLM knowledge with RecSys signals through a gated embedding framework (LARM). It introduces a 30-second live-streaming LLM tuning pipeline, a two-tower gated fusion to fuse author IDs and LLM embeddings, and a hierarchical semantic-code quantification to compress real-time semantics for user histories. Offline and online experiments in an industrial setting demonstrate consistent retrieval and ranking gains, improved embedding alignment, and meaningful semantic clustering of content, with particular benefits for long-tail authors. The approach offers a practical pathway to deploy multi-modal, real-time semantic understanding within RecSys, enabling more accurate and contextually relevant live-streaming recommendations at scale.

Abstract

In recent years, integrated short-video and live-streaming platforms have gained massive global adoption, offering dynamic content creation and consumption. Unlike pre-recorded short videos, live-streaming enables real-time interaction between authors and users, fostering deeper engagement. However, this dynamic nature introduces a critical challenge for recommendation systems (RecSys): the same live-streaming vastly different experiences depending on when a user watching. To optimize recommendations, a RecSys must accurately interpret the real-time semantics of live content and align them with user preferences.

Paper Structure

This paper contains 24 sections, 8 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Two-types of conflicts: (i) authors' different live-streaming could have different topics; (ii) for a live-streaming, different users could watch different intervals.
  • Figure 2: The workflow of LARM: (a) Tuning a LLM to produce real-time live-streaming embedding every 30s; (2) Gated fusion mechanism to align the RecSys ID space and multi-modal LLM space; (3) Quantifying the real-time aligned author embedding to record the certain semantics of corresponding live-streaming at users’ watching interval, to capture user interests accurately.
  • Figure 3: Online exposure changes across author group with different amount fans.
  • Figure 4: Different models' author-to-author retrieved author results for the left trigger author.
  • Figure 5: Case Study of LARM quantifying codes.