LLaRA: Large Language-Recommendation Assistant
Jiayi Liao, Sihang Li, Zhengyi Yang, Jiancan Wu, Yancheng Yuan, Xiang Wang, Xiangnan He
TL;DR
The paper tackles the challenge of sequential recommendation by leveraging the world knowledge and reasoning capabilities of large language models (LLMs) while preserving the strong behavioral modeling of traditional sequential recommenders. It introduces LLaRA, a framework that uses hybrid prompting to fuse textual item metadata with behavior-derived embeddings via a SR2LLM projector, and a curriculum prompt tuning scheme that warms up with text-only prompts before integrating hybrid prompts. The core contributions include (1) hybrid item representations that combine textual and behavioral information, (2) a modality-alignment approach that treats sequential behaviors as a distinct input modality for LLMs, and (3) curriculum-based training that improves stability and performance, validated by experiments showing superior HitRatio@1 across MovieLens, Steam, and LastFM compared with strong baselines. The work demonstrates a practical path toward unified recommender systems that capitalize on both extensive world knowledge and robust user behavior modeling, with potential for broader multimodal and instruction-tuned recommendation scenarios.
Abstract
Sequential recommendation aims to predict users' next interaction with items based on their past engagement sequence. Recently, the advent of Large Language Models (LLMs) has sparked interest in leveraging them for sequential recommendation, viewing it as language modeling. Previous studies represent items within LLMs' input prompts as either ID indices or textual metadata. However, these approaches often fail to either encapsulate comprehensive world knowledge or exhibit sufficient behavioral understanding. To combine the complementary strengths of conventional recommenders in capturing behavioral patterns of users and LLMs in encoding world knowledge about items, we introduce Large Language-Recommendation Assistant (LLaRA). Specifically, it uses a novel hybrid prompting method that integrates ID-based item embeddings learned by traditional recommendation models with textual item features. Treating the "sequential behaviors of users" as a distinct modality beyond texts, we employ a projector to align the traditional recommender's ID embeddings with the LLM's input space. Moreover, rather than directly exposing the hybrid prompt to LLMs, a curriculum learning strategy is adopted to gradually ramp up training complexity. Initially, we warm up the LLM using text-only prompts, which better suit its inherent language modeling ability. Subsequently, we progressively transition to the hybrid prompts, training the model to seamlessly incorporate the behavioral knowledge from the traditional sequential recommender into the LLM. Empirical results validate the effectiveness of our proposed framework. Codes are available at https://github.com/ljy0ustc/LLaRA.
