Table of Contents
Fetching ...

EnhancedRL: An Enhanced-State Reinforcement Learning Algorithm for Multi-Task Fusion in Recommender Systems

Peng Liu, Cong Xu, Jiawei Zhu, Ming Zhao, Bin Wang

TL;DR

<3-5 sentence high-level summary> EnhancedRL addresses the limitation of prior RL-MTF methods that rely solely on user features by introducing an enhanced state that also includes item features and context, and by producing per-item actions rather than a single user-level action. It redefines the actor-critic framework for user-item pair granularity within a session, with a tailored online exploration strategy and bound-aware learning to maximize long-term rewards. Extensive offline and online experiments in Tencent's industrial RS show substantial gains in both cumulative rewards and key engagement metrics, culminating in real-world deployment since September 2023. This work provides the first demonstration of maximizing long-term user satisfaction at the user-item pair level in RSs and offers a practical, scalable approach to improve multi-task fusion outcomes.

Abstract

As a key stage of Recommender Systems (RSs), Multi-Task Fusion (MTF) is responsible for merging multiple scores output by Multi-Task Learning (MTL) into a single score, finally determining the recommendation results. Recently, Reinforcement Learning (RL) has been applied to MTF to maximize long-term user satisfaction within a recommendation session. However, due to limitations in modeling paradigm, all existing RL algorithms for MTF can only utilize user features and statistical features as the state to generate actions at the user level, but unable to leverage item features and other valuable features, which leads to suboptimal performance. Overcoming this problem requires a breakthrough in the existing modeling paradigm, yet, to date, no prior work has addressed it. To tackle this challenge, we propose EnhancedRL, an innovative RL algorithm. Unlike existing RL-MTF methods, EnhancedRL takes the enhanced state as input, incorporating not only user features but also item features and other valuable information. Furthermore, it introduces a tailored actor-critic framework - including redesigned actor and critics and a novel learning procedure - to optimize long-term rewards at the user-item pair level within a recommendation session. Extensive offline and online experiments are conducted in an industrial RS and the results demonstrate that EnhancedRL outperforms other methods remarkably, achieving a +3.84% increase in user valid consumption and a +0.58% increase in user duration time. To the best of our knowledge, EnhancedRL is the first work to address this challenge, and it has been fully deployed in a large-scale RS since September 14, 2023, yielding significant improvements.

EnhancedRL: An Enhanced-State Reinforcement Learning Algorithm for Multi-Task Fusion in Recommender Systems

TL;DR

<3-5 sentence high-level summary> EnhancedRL addresses the limitation of prior RL-MTF methods that rely solely on user features by introducing an enhanced state that also includes item features and context, and by producing per-item actions rather than a single user-level action. It redefines the actor-critic framework for user-item pair granularity within a session, with a tailored online exploration strategy and bound-aware learning to maximize long-term rewards. Extensive offline and online experiments in Tencent's industrial RS show substantial gains in both cumulative rewards and key engagement metrics, culminating in real-world deployment since September 2023. This work provides the first demonstration of maximizing long-term user satisfaction at the user-item pair level in RSs and offers a practical, scalable approach to improve multi-task fusion outcomes.

Abstract

As a key stage of Recommender Systems (RSs), Multi-Task Fusion (MTF) is responsible for merging multiple scores output by Multi-Task Learning (MTL) into a single score, finally determining the recommendation results. Recently, Reinforcement Learning (RL) has been applied to MTF to maximize long-term user satisfaction within a recommendation session. However, due to limitations in modeling paradigm, all existing RL algorithms for MTF can only utilize user features and statistical features as the state to generate actions at the user level, but unable to leverage item features and other valuable features, which leads to suboptimal performance. Overcoming this problem requires a breakthrough in the existing modeling paradigm, yet, to date, no prior work has addressed it. To tackle this challenge, we propose EnhancedRL, an innovative RL algorithm. Unlike existing RL-MTF methods, EnhancedRL takes the enhanced state as input, incorporating not only user features but also item features and other valuable information. Furthermore, it introduces a tailored actor-critic framework - including redesigned actor and critics and a novel learning procedure - to optimize long-term rewards at the user-item pair level within a recommendation session. Extensive offline and online experiments are conducted in an industrial RS and the results demonstrate that EnhancedRL outperforms other methods remarkably, achieving a +3.84% increase in user valid consumption and a +0.58% increase in user duration time. To the best of our knowledge, EnhancedRL is the first work to address this challenge, and it has been fully deployed in a large-scale RS since September 14, 2023, yielding significant improvements.
Paper Structure (24 sections, 10 equations, 6 figures, 6 tables)

This paper contains 24 sections, 10 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: The interactions between a user and an RS within a recommendation session.
  • Figure 2: The modeling paradigm of existing RL-MTF can only use user level features as the state and generates actions at user level within a recommendation session.
  • Figure 3: Comparison between the action distribution of our custom exploration policy and that of the Gaussian-noise exploration policy.
  • Figure 4: The framework of EnhancedRL, which consists of an actor and $q$ sets of critics, along with corresponding target actor and target critics.
  • Figure 5: Hierarchical state in EnhancedRL: an enhanced state comprises user features, item features, and other contextual features; the state of a recommendation list consists of $l$ enhanced states.
  • ...and 1 more figures