Sequential Recommendation for Optimizing Both Immediate Feedback and Long-term Retention

Ziru Liu; Shuchang Liu; Zijian Zhang; Qingpeng Cai; Xiangyu Zhao; Kesen Zhao; Lantao Hu; Peng Jiang; Kun Gai

Sequential Recommendation for Optimizing Both Immediate Feedback and Long-term Retention

Ziru Liu, Shuchang Liu, Zijian Zhang, Qingpeng Cai, Xiangyu Zhao, Kesen Zhao, Lantao Hu, Peng Jiang, Kun Gai

TL;DR

DT4IER addresses the challenge of balancing short-term user engagement and long-term retention in sequential recommendations. It leverages a Decision Transformer backbone with a multi-reward design, an adaptive RTG balancing module conditioned on user features, and a high-dimensional reward embedding with a contrastive loss term. Empirical results on three real-world datasets show consistent improvements over state-of-the-art SRS and MTL baselines in both prediction accuracy and retention metrics, with ablation analyses confirming the contribution of each component. The work provides a practical pathway to simultaneously optimize immediate feedback and user retention in industrial recommender system deployments.

Abstract

In the landscape of Recommender System (RS) applications, reinforcement learning (RL) has recently emerged as a powerful tool, primarily due to its proficiency in optimizing long-term rewards. Nevertheless, it suffers from instability in the learning process, stemming from the intricate interactions among bootstrapping, off-policy training, and function approximation. Moreover, in multi-reward recommendation scenarios, designing a proper reward setting that reconciles the inner dynamics of various tasks is quite intricate. In response to these challenges, we introduce DT4IER, an advanced decision transformer-based recommendation model that is engineered to not only elevate the effectiveness of recommendations but also to achieve a harmonious balance between immediate user engagement and long-term retention. The DT4IER applies an innovative multi-reward design that adeptly balances short and long-term rewards with user-specific attributes, which serve to enhance the contextual richness of the reward sequence ensuring a more informed and personalized recommendation process. To enhance its predictive capabilities, DT4IER incorporates a high-dimensional encoder, skillfully designed to identify and leverage the intricate interrelations across diverse tasks. Furthermore, we integrate a contrastive learning approach within the action embedding predictions, a strategy that significantly boosts the model's overall performance. Experiments on three real-world datasets demonstrate the effectiveness of DT4IER against state-of-the-art Sequential Recommender Systems (SRSs) and Multi-Task Learning (MTL) models in terms of both prediction accuracy and effectiveness in specific tasks. The source code is accessible online to facilitate replication

Sequential Recommendation for Optimizing Both Immediate Feedback and Long-term Retention

TL;DR

Abstract

Sequential Recommendation for Optimizing Both Immediate Feedback and Long-term Retention

Authors

TL;DR

Abstract

Table of Contents

Figures (4)