Reinforced Prompt Personalization for Recommendation with Large Language Models

Wenyu Mao; Jiancan Wu; Weijian Chen; Chongming Gao; Xiang Wang; Xiangnan He

Reinforced Prompt Personalization for Recommendation with Large Language Models

Wenyu Mao, Jiancan Wu, Weijian Chen, Chongming Gao, Xiang Wang, Xiangnan He

TL;DR

The paper tackles the limitations of task-wise prompts in LLM-based recommendation by introducing Reinforced Prompt Personalization (RPP) and its enhanced version RPP+. Framed as a multi-agent RL problem under Centralized Training with Decentralized Execution, RPP personalizes four sentence-level prompt patterns (role-playing, history records, reasoning guidance, and output format) for individual users and concatenates them to guide a frozen LLM recommender, with RPP+ adding a dynamic refine step. The approach optimizes prompts through MARL to maximize ranking rewards like $r_t = \mathrm{NDCG@M}$ with $M=10$, and demonstrates strong improvements over traditional models, few-shot methods, and other prompt-based methods across MovieLens-1M, Games, and Lastfm, with robust generalization across LLaMa2-7B-chat, ChatGPT, and Alpaca. Extensive ablations, sensitivity analyses, case studies, and timing analyses support the effectiveness and practicality of instance-wise prompting for LLM-powered recommendations, highlighting its potential to tailor insights to diverse users while managing computational costs. Overall, the work advances prompt engineering by decomposing prompts into meaningful patterns and optimizing them via MARL to yield personalized, high-quality recommendations.

Abstract

Designing effective prompts can empower LLMs to understand user preferences and provide recommendations with intent comprehension and knowledge utilization capabilities. Nevertheless, recent studies predominantly concentrate on task-wise prompting, developing fixed prompt templates shared across all users in a given recommendation task (e.g., rating or ranking). Although convenient, task-wise prompting overlooks individual user differences, leading to inaccurate analysis of user interests. In this work, we introduce the concept of instance-wise prompting, aiming at personalizing discrete prompts for individual users. Toward this end, we propose Reinforced Prompt Personalization (RPP) to realize it automatically. To improve efficiency and quality, RPP personalizes prompts at the sentence level rather than searching in the vast vocabulary word-by-word. Specifically, RPP breaks down the prompt into four patterns, tailoring patterns based on multi-agent and combining them. Then the personalized prompts interact with LLMs (environment) iteratively, to boost LLMs' recommending performance (reward). In addition to RPP, to improve the scalability of action space, our proposal of RPP+ dynamically refines the selected actions with LLMs throughout the iterative process. Extensive experiments on various datasets demonstrate the superiority of RPP/RPP+ over traditional recommender models, few-shot methods, and other prompt-based methods, underscoring the significance of instance-wise prompting in LLMs for recommendation. Our code is available at https://github.com/maowenyu-11/RPP.

Reinforced Prompt Personalization for Recommendation with Large Language Models

TL;DR

with

, and demonstrates strong improvements over traditional models, few-shot methods, and other prompt-based methods across MovieLens-1M, Games, and Lastfm, with robust generalization across LLaMa2-7B-chat, ChatGPT, and Alpaca. Extensive ablations, sensitivity analyses, case studies, and timing analyses support the effectiveness and practicality of instance-wise prompting for LLM-powered recommendations, highlighting its potential to tailor insights to diverse users while managing computational costs. Overall, the work advances prompt engineering by decomposing prompts into meaningful patterns and optimizing them via MARL to yield personalized, high-quality recommendations.

Abstract

Paper Structure (29 sections, 8 equations, 14 figures, 8 tables, 1 algorithm)

This paper contains 29 sections, 8 equations, 14 figures, 8 tables, 1 algorithm.

Introduction
Preliminaries
Task-wise Prompting
Instance-wise Prompting
Methodology
Formulation of RL for Prompt Personalization
Action Space
State Space
Actor-Critic Based Architecture of the Multi-agent and Reward Function
Algorithm
Experiments
Experimental Settings
Datasets
Baselines
Frozen Pre-trained LLMs
...and 14 more sections

Figures (14)

Figure 1: Comparison between task-wise prompting and instance-wise prompting for recommendation. The ground truth for candidate movies that the users will watch next is marked in red. The personalized parts in prompts for different users are highlighted with yellow.
Figure 2: The framework of our proposed RPP/RPP+. MARL serves as the core component to personalize instance-wise prompts with four distinct patterns, using corresponding agents. It is trained iteratively to maximize rewards based on the outputs of the frozen LLM-based recommender. Once trained, MARL can select optimal actions from four patterns to generate personalized prompts for each user based on their data, effectively prompting the LLM-based recommender for tailored recommendations. In addition to RPP, the "Refine" block is designed for RPP+ to enhance the flexibility and quality of the selected actions, utilizing other LLMs to dynamically refine the selected actions before prompting the LLM-based recommender.
Figure 3: The performance comparison between prompt-based methods and RPP/RPP+ with different patterns. "Manual", "Enum", and "GRIPS" represent the baseline prompt-based methods. "RPP-Ro", "RPP-Hi", "RPP-Re", and "RPP-Ou" are the 4 variations of RPP/RPP+ on "role-playing", "history records", "reasoning guidance", and "output format" patterns, respectively.
Figure 4: Sensitivity to the number of training examples, which demonstrates the changes in RPP/RPP+'s performance on three datasets as the number of training examples increases from $100$ to $500$.
Figure 5: Sensitivity to the number of candidate items, which indicates the drop in performance of LLM-based methods as the number of candidate items increases.
...and 9 more figures

Reinforced Prompt Personalization for Recommendation with Large Language Models

TL;DR

Abstract

Reinforced Prompt Personalization for Recommendation with Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (14)