Table of Contents
Fetching ...

Future-Conditioned Recommendations with Multi-Objective Controllable Decision Transformer

Chongming Gao, Kexin Huang, Ziang Fei, Jiaju Chen, Jiawei Chen, Jianshan Sun, Shuchang Liu, Qingpeng Cai, Peng Jiang

TL;DR

This paper tackles the challenge of optimizing long-term user satisfaction in recommender systems under multiple, potentially conflicting objectives. It proposes a future-conditioned framework called MocDT, which uses a control signal to steer a Decision Transformer-based model toward specified objectives, enabling inference-time customization without retraining. The method treats future objectives as prompts that influence trajectory generation, and augments offline data with synthetic sequences to mitigate dataset limitations. Empirical results across MovieLens, KuaiRand, and Zhihu demonstrate MocDT’s ability to balance rating and diversity while maintaining competitive performance, with data augmentation providing dataset-dependent gains and insights into objective control. The work advances controllable, multi-objective recommendation by integrating offline RL, transformer-based sequence modeling, and careful data augmentation strategies, offering practical implications for flexible, long-horizon optimization in industrial settings.

Abstract

Securing long-term success is the ultimate aim of recommender systems, demanding strategies capable of foreseeing and shaping the impact of decisions on future user satisfaction. Current recommendation strategies grapple with two significant hurdles. Firstly, the future impacts of recommendation decisions remain obscured, rendering it impractical to evaluate them through direct optimization of immediate metrics. Secondly, conflicts often emerge between multiple objectives, like enhancing accuracy versus exploring diverse recommendations. Existing strategies, trapped in a "training, evaluation, and retraining" loop, grow more labor-intensive as objectives evolve. To address these challenges, we introduce a future-conditioned strategy for multi-objective controllable recommendations, allowing for the direct specification of future objectives and empowering the model to generate item sequences that align with these goals autoregressively. We present the Multi-Objective Controllable Decision Transformer (MocDT), an offline Reinforcement Learning (RL) model capable of autonomously learning the mapping from multiple objectives to item sequences, leveraging extensive offline data. Consequently, it can produce recommendations tailored to any specified objectives during the inference stage. Our empirical findings emphasize the controllable recommendation strategy's ability to produce item sequences according to different objectives while maintaining performance that is competitive with current recommendation strategies across various objectives.

Future-Conditioned Recommendations with Multi-Objective Controllable Decision Transformer

TL;DR

This paper tackles the challenge of optimizing long-term user satisfaction in recommender systems under multiple, potentially conflicting objectives. It proposes a future-conditioned framework called MocDT, which uses a control signal to steer a Decision Transformer-based model toward specified objectives, enabling inference-time customization without retraining. The method treats future objectives as prompts that influence trajectory generation, and augments offline data with synthetic sequences to mitigate dataset limitations. Empirical results across MovieLens, KuaiRand, and Zhihu demonstrate MocDT’s ability to balance rating and diversity while maintaining competitive performance, with data augmentation providing dataset-dependent gains and insights into objective control. The work advances controllable, multi-objective recommendation by integrating offline RL, transformer-based sequence modeling, and careful data augmentation strategies, offering practical implications for flexible, long-horizon optimization in industrial settings.

Abstract

Securing long-term success is the ultimate aim of recommender systems, demanding strategies capable of foreseeing and shaping the impact of decisions on future user satisfaction. Current recommendation strategies grapple with two significant hurdles. Firstly, the future impacts of recommendation decisions remain obscured, rendering it impractical to evaluate them through direct optimization of immediate metrics. Secondly, conflicts often emerge between multiple objectives, like enhancing accuracy versus exploring diverse recommendations. Existing strategies, trapped in a "training, evaluation, and retraining" loop, grow more labor-intensive as objectives evolve. To address these challenges, we introduce a future-conditioned strategy for multi-objective controllable recommendations, allowing for the direct specification of future objectives and empowering the model to generate item sequences that align with these goals autoregressively. We present the Multi-Objective Controllable Decision Transformer (MocDT), an offline Reinforcement Learning (RL) model capable of autonomously learning the mapping from multiple objectives to item sequences, leveraging extensive offline data. Consequently, it can produce recommendations tailored to any specified objectives during the inference stage. Our empirical findings emphasize the controllable recommendation strategy's ability to produce item sequences according to different objectives while maintaining performance that is competitive with current recommendation strategies across various objectives.
Paper Structure (25 sections, 6 equations, 5 figures, 5 tables)

This paper contains 25 sections, 6 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Illustration of two paradigms in recommendation
  • Figure 2: Architecture of the MocDT method.
  • Figure 3: Evaluation results conditioned on nine prioritized objectives. Here, the notation (1.0, 0.0) corresponds to scenarios where $O_{\text{rating}} = 1$ and $O_{\text{div}} = 0$, indicating a focus on rating maximization with no emphasis on diversity.
  • Figure 4: Effect of augment strategies
  • Figure 5: Effect of augment rates

Theorems & Definitions (1)

  • definition 1: Control Signal