Large Language Models are Learnable Planners for Long-Term Recommendation
Wentao Shi, Xiangnan He, Yang Zhang, Chongming Gao, Xinyue Li, Jizhi Zhang, Qifan Wang, Fuli Feng
TL;DR
This work tackles long-term engagement in interactive recommendations by leveraging the planning capabilities of Large Language Models. It introduces BiLLP, a bi-level framework that separates macro-level guidance (Planner and Reflector) from micro-level personalization (Actor and Critic), enabling efficient learning from sparse data through memory-augmented prompting and in-context updates. Empirical results in simulated environments show BiLLP outperforms RL baselines and other LLM approaches in trajectory length and cumulative reward, with ablations confirming the value of both macro- and micro-learning and a Critic-based variance reduction. The framework demonstrates robustness across environments and base models, highlighting the practical potential of LLM-driven planning for long-term recommendation systems.
Abstract
Planning for both immediate and long-term benefits becomes increasingly important in recommendation. Existing methods apply Reinforcement Learning (RL) to learn planning capacity by maximizing cumulative reward for long-term recommendation. However, the scarcity of recommendation data presents challenges such as instability and susceptibility to overfitting when training RL models from scratch, resulting in sub-optimal performance. In this light, we propose to leverage the remarkable planning capabilities over sparse data of Large Language Models (LLMs) for long-term recommendation. The key to achieving the target lies in formulating a guidance plan following principles of enhancing long-term engagement and grounding the plan to effective and executable actions in a personalized manner. To this end, we propose a Bi-level Learnable LLM Planner framework, which consists of a set of LLM instances and breaks down the learning process into macro-learning and micro-learning to learn macro-level guidance and micro-level personalized recommendation policies, respectively. Extensive experiments validate that the framework facilitates the planning ability of LLMs for long-term recommendation. Our code and data can be found at https://github.com/jizhi-zhang/BiLLP.
