A Model-based Multi-Agent Personalized Short-Video Recommender System
Peilun Zhou, Xiaoxiao Xu, Lantao Hu, Han Li, Peng Jiang
TL;DR
The paper tackles session-level optimization for short-video recommendations under a reinforcement learning framework. It introduces a model-based, collaborative multi-agent ranking method (MMRF) that jointly maximizes WatchTime and auxiliary user interactions through an attentive information-sharing mechanism. To combat sample selection bias, it injects non-impression samples and trains a feedback-fitting model to simulate user responses, enabling robust offline policy learning; uncertainty is captured via a Siamese-style structure to guide exploration. Extensive offline evaluations on public and production data, along with online A/B tests, demonstrate significant performance gains over strong baselines, and the method is deployed in a large-scale platform serving hundreds of millions of users. The approach advances practical RL-based recommender systems by combining collaborative multi-agent planning with model-based bias mitigation and production-ready deployment.
Abstract
Recommender selects and presents top-K items to the user at each online request, and a recommendation session consists of several sequential requests. Formulating a recommendation session as a Markov decision process and solving it by reinforcement learning (RL) framework has attracted increasing attention from both academic and industry communities. In this paper, we propose a RL-based industrial short-video recommender ranking framework, which models and maximizes user watch-time in an environment of user multi-aspect preferences by a collaborative multi-agent formulization. Moreover, our proposed framework adopts a model-based learning approach to alleviate the sample selection bias which is a crucial but intractable problem in industrial recommender system. Extensive offline evaluations and live experiments confirm the effectiveness of our proposed method over alternatives. Our proposed approach has been deployed in our real large-scale short-video sharing platform, successfully serving over hundreds of millions users.
