Selecting User Histories to Generate LLM Users for Cold-Start Item Recommendation
Nachiket Subbaraman, Jaskinder Sarai, Aniruddh Nath, Lichan Hong, Lukasz Heldt, Li Wei, Zhe Zhao
TL;DR
The paper tackles cold-start item recommendations by using LLMs to generate augmented interactions, selecting which users to augment via a policy-gradient RL framework. It prompts the LLM to act as a user with full history and trains a two-tower recommender on augmented data, with the policy learning to maximize cold-start recall using a contextual-bandit setup and per-user rewards. Key contributions include the LLM-as-user augmentation, a REINFORCE-based user-selection policy, proxy rewards for efficiency, and extensive experiments on Amazon Beauty and Sports showing superior cold-start recall over baselines. The approach demonstrates a scalable, serving-efficient augmentation strategy that improves cold-start item performance and opens avenues for cross-domain extensions and advanced policy optimization.
Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities in reasoning, generalization, and simulating human-like behavior across a wide range of tasks. These strengths present new opportunities to enhance traditional recommendation systems (RS), especially in the cold-start item scenario where newly introduced items lack interactions. Existing works have used LLMs to address cold-start issues in traditional RS through data augmentation, but they have limitations. One recent work directly addresses this issue by prompting LLMs to generate augmented interaction data between randomly sampled users and cold-start items. Then, they train the traditional RS with augmented data, incorporating collaborative signals for cold-start items. Although they use LLMs to provide cold-start items with feedback, they use partial user histories, which does not allow the LLM to fully emulate the user. Furthermore, randomly selecting users is not optimal for augmentation. To address these challenges, we leverage the LLM as a user and develop a reinforcement learning (RL) framework that trains a policy to select users for augmentation, optimizing for cold-start item performance after augmented training. The policy model learns to select users for cold-start item data augmentation based on their behavioral features and histories. To optimize user selection for cold-start item performance, we employ a policy gradient method that updates the policy in the direction of actions that lead to high rewards. Experiments on Amazon Product Review datasets show substantial gains in cold-start item recall, demonstrating the effectiveness of our method as a scalable, serving-efficient augmentation strategy for modern RS.
