Table of Contents
Fetching ...

Large Language Models are Learnable Planners for Long-Term Recommendation

Wentao Shi, Xiangnan He, Yang Zhang, Chongming Gao, Xinyue Li, Jizhi Zhang, Qifan Wang, Fuli Feng

TL;DR

This work tackles long-term engagement in interactive recommendations by leveraging the planning capabilities of Large Language Models. It introduces BiLLP, a bi-level framework that separates macro-level guidance (Planner and Reflector) from micro-level personalization (Actor and Critic), enabling efficient learning from sparse data through memory-augmented prompting and in-context updates. Empirical results in simulated environments show BiLLP outperforms RL baselines and other LLM approaches in trajectory length and cumulative reward, with ablations confirming the value of both macro- and micro-learning and a Critic-based variance reduction. The framework demonstrates robustness across environments and base models, highlighting the practical potential of LLM-driven planning for long-term recommendation systems.

Abstract

Planning for both immediate and long-term benefits becomes increasingly important in recommendation. Existing methods apply Reinforcement Learning (RL) to learn planning capacity by maximizing cumulative reward for long-term recommendation. However, the scarcity of recommendation data presents challenges such as instability and susceptibility to overfitting when training RL models from scratch, resulting in sub-optimal performance. In this light, we propose to leverage the remarkable planning capabilities over sparse data of Large Language Models (LLMs) for long-term recommendation. The key to achieving the target lies in formulating a guidance plan following principles of enhancing long-term engagement and grounding the plan to effective and executable actions in a personalized manner. To this end, we propose a Bi-level Learnable LLM Planner framework, which consists of a set of LLM instances and breaks down the learning process into macro-learning and micro-learning to learn macro-level guidance and micro-level personalized recommendation policies, respectively. Extensive experiments validate that the framework facilitates the planning ability of LLMs for long-term recommendation. Our code and data can be found at https://github.com/jizhi-zhang/BiLLP.

Large Language Models are Learnable Planners for Long-Term Recommendation

TL;DR

This work tackles long-term engagement in interactive recommendations by leveraging the planning capabilities of Large Language Models. It introduces BiLLP, a bi-level framework that separates macro-level guidance (Planner and Reflector) from micro-level personalization (Actor and Critic), enabling efficient learning from sparse data through memory-augmented prompting and in-context updates. Empirical results in simulated environments show BiLLP outperforms RL baselines and other LLM approaches in trajectory length and cumulative reward, with ablations confirming the value of both macro- and micro-learning and a Critic-based variance reduction. The framework demonstrates robustness across environments and base models, highlighting the practical potential of LLM-driven planning for long-term recommendation systems.

Abstract

Planning for both immediate and long-term benefits becomes increasingly important in recommendation. Existing methods apply Reinforcement Learning (RL) to learn planning capacity by maximizing cumulative reward for long-term recommendation. However, the scarcity of recommendation data presents challenges such as instability and susceptibility to overfitting when training RL models from scratch, resulting in sub-optimal performance. In this light, we propose to leverage the remarkable planning capabilities over sparse data of Large Language Models (LLMs) for long-term recommendation. The key to achieving the target lies in formulating a guidance plan following principles of enhancing long-term engagement and grounding the plan to effective and executable actions in a personalized manner. To this end, we propose a Bi-level Learnable LLM Planner framework, which consists of a set of LLM instances and breaks down the learning process into macro-learning and micro-learning to learn macro-level guidance and micro-level personalized recommendation policies, respectively. Extensive experiments validate that the framework facilitates the planning ability of LLMs for long-term recommendation. Our code and data can be found at https://github.com/jizhi-zhang/BiLLP.
Paper Structure (26 sections, 18 equations, 5 figures, 6 tables)

This paper contains 26 sections, 18 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: The overview of the proposed BiLLP framework. The black line indicates that the data serves as a prompt input for the subsequent module. The red line denotes that the data is utilized to update the memory of the subsequent module.
  • Figure 2: The memory-based learning methods and policy gradient based methods have a comparable impact on the Actor policy.
  • Figure 3: The frequency distribution of items recommended by our method and A2C in the two environments.
  • Figure 4: The memory-based in-context learning methods and policy gradient-based methods have a comparable impact on the Actor policy.
  • Figure 5: Results under different simulated environments.