MMBee: Live Streaming Gift-Sending Recommendations via Multi-Modal Fusion and Behaviour Expansion

Jiaxin Deng; Shiyao Wang; Yuchen Wang; Jiansong Qi; Liqin Zhao; Guorui Zhou; Gaofeng Meng

MMBee: Live Streaming Gift-Sending Recommendations via Multi-Modal Fusion and Behaviour Expansion

Jiaxin Deng, Shiyao Wang, Yuchen Wang, Jiansong Qi, Liqin Zhao, Guorui Zhou, Gaofeng Meng

TL;DR

MMBee tackles live streaming gifting prediction by jointly modeling real-time multi-modal content and expanding sparse user–author interactions through graph-guided behavior expansion. It introduces Multi-modal Fusion with Learnable Query (MFQ) to fuse visual, audio, and textual signals and Graph-guided Interest Expansion (GIE) to enrich representations via User-to-Author and Author-to-Author graphs with GraphCL pre-training and metapath-based expansion. The approach yields significant offline gains on Kuaishou's large-scale industrial dataset and public benchmarks, with online A/B tests showing meaningful improvements in engagement and revenue, and has been deployed to serve hundreds of millions of users. These results demonstrate the practicality of decoupled offline graph learning and online inference for low-latency, multi-modal gifting prediction in large-scale live streaming platforms.

Abstract

Live streaming services are becoming increasingly popular due to real-time interactions and entertainment. Viewers can chat and send comments or virtual gifts to express their preferences for the streamers. Accurately modeling the gifting interaction not only enhances users' experience but also increases streamers' revenue. Previous studies on live streaming gifting prediction treat this task as a conventional recommendation problem, and model users' preferences using categorical data and observed historical behaviors. However, it is challenging to precisely describe the real-time content changes in live streaming using limited categorical information. Moreover, due to the sparsity of gifting behaviors, capturing the preferences and intentions of users is quite difficult. In this work, we propose MMBee based on real-time Multi-Modal Fusion and Behaviour Expansion to address these issues. Specifically, we first present a Multi-modal Fusion Module with Learnable Query (MFQ) to perceive the dynamic content of streaming segments and process complex multi-modal interactions, including images, text comments and speech. To alleviate the sparsity issue of gifting behaviors, we present a novel Graph-guided Interest Expansion (GIE) approach that learns both user and streamer representations on large-scale gifting graphs with multi-modal attributes. Comprehensive experiment results show that MMBee achieves significant performance improvements on both public datasets and Kuaishou real-world streaming datasets and the effectiveness has been further validated through online A/B experiments. MMBee has been deployed and is serving hundreds of millions of users at Kuaishou.

MMBee: Live Streaming Gift-Sending Recommendations via Multi-Modal Fusion and Behaviour Expansion

TL;DR

Abstract

Paper Structure (23 sections, 13 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 23 sections, 13 equations, 6 figures, 5 tables, 1 algorithm.

Introduction
Related Work
Live Streaming Gifting Recommendation
Personalized Recommendation
Preliminaries
Multi-modal Fusion with Learnable Query
Graph-guided Interest Expansion
User-to-Author and Author-to-Author Graph
Node Representation Pre-training with GraphCL
Metapath-guided Behavior Expansion through End-to-End Training
System Deployment
Experiment
Dataset
Kuaishou Dataset
Public Dataset
...and 8 more sections

Figures (6)

Figure 1: Example of the live streaming gifting scenario with the interactions among users and streamers.
Figure 2: The overall framework of MMBee, consists of two stages: (i) the offline Graph-guided Interest Expansion (GIE) stage conducts the behavior expansion based on the target user and author; (ii) the online GTR prediction stage aggregates the real-time multi-modal content and expanded behavior for end-to-end training.
Figure 3: User-to-author and author-to-author donation graph construction with donation history.
Figure 4: The deployment of MMBee in online live streaming GTR prediction system.
Figure 5: Visualization of the learnable query distribution in MFQ, where each point indicates an author.
...and 1 more figures

Theorems & Definitions (2)

Definition 1: Metapathfan2019metapath
Definition 2: Metapath-guided Neighborsfan2019metapath

MMBee: Live Streaming Gift-Sending Recommendations via Multi-Modal Fusion and Behaviour Expansion

TL;DR

Abstract

MMBee: Live Streaming Gift-Sending Recommendations via Multi-Modal Fusion and Behaviour Expansion

Authors

TL;DR

Abstract

Table of Contents

Figures (6)

Theorems & Definitions (2)