MMBee: Live Streaming Gift-Sending Recommendations via Multi-Modal Fusion and Behaviour Expansion
Jiaxin Deng, Shiyao Wang, Yuchen Wang, Jiansong Qi, Liqin Zhao, Guorui Zhou, Gaofeng Meng
TL;DR
MMBee tackles live streaming gifting prediction by jointly modeling real-time multi-modal content and expanding sparse user–author interactions through graph-guided behavior expansion. It introduces Multi-modal Fusion with Learnable Query (MFQ) to fuse visual, audio, and textual signals and Graph-guided Interest Expansion (GIE) to enrich representations via User-to-Author and Author-to-Author graphs with GraphCL pre-training and metapath-based expansion. The approach yields significant offline gains on Kuaishou's large-scale industrial dataset and public benchmarks, with online A/B tests showing meaningful improvements in engagement and revenue, and has been deployed to serve hundreds of millions of users. These results demonstrate the practicality of decoupled offline graph learning and online inference for low-latency, multi-modal gifting prediction in large-scale live streaming platforms.
Abstract
Live streaming services are becoming increasingly popular due to real-time interactions and entertainment. Viewers can chat and send comments or virtual gifts to express their preferences for the streamers. Accurately modeling the gifting interaction not only enhances users' experience but also increases streamers' revenue. Previous studies on live streaming gifting prediction treat this task as a conventional recommendation problem, and model users' preferences using categorical data and observed historical behaviors. However, it is challenging to precisely describe the real-time content changes in live streaming using limited categorical information. Moreover, due to the sparsity of gifting behaviors, capturing the preferences and intentions of users is quite difficult. In this work, we propose MMBee based on real-time Multi-Modal Fusion and Behaviour Expansion to address these issues. Specifically, we first present a Multi-modal Fusion Module with Learnable Query (MFQ) to perceive the dynamic content of streaming segments and process complex multi-modal interactions, including images, text comments and speech. To alleviate the sparsity issue of gifting behaviors, we present a novel Graph-guided Interest Expansion (GIE) approach that learns both user and streamer representations on large-scale gifting graphs with multi-modal attributes. Comprehensive experiment results show that MMBee achieves significant performance improvements on both public datasets and Kuaishou real-world streaming datasets and the effectiveness has been further validated through online A/B experiments. MMBee has been deployed and is serving hundreds of millions of users at Kuaishou.
