Table of Contents
Fetching ...

LLaRA: Large Language-Recommendation Assistant

Jiayi Liao, Sihang Li, Zhengyi Yang, Jiancan Wu, Yancheng Yuan, Xiang Wang, Xiangnan He

TL;DR

The paper tackles the challenge of sequential recommendation by leveraging the world knowledge and reasoning capabilities of large language models (LLMs) while preserving the strong behavioral modeling of traditional sequential recommenders. It introduces LLaRA, a framework that uses hybrid prompting to fuse textual item metadata with behavior-derived embeddings via a SR2LLM projector, and a curriculum prompt tuning scheme that warms up with text-only prompts before integrating hybrid prompts. The core contributions include (1) hybrid item representations that combine textual and behavioral information, (2) a modality-alignment approach that treats sequential behaviors as a distinct input modality for LLMs, and (3) curriculum-based training that improves stability and performance, validated by experiments showing superior HitRatio@1 across MovieLens, Steam, and LastFM compared with strong baselines. The work demonstrates a practical path toward unified recommender systems that capitalize on both extensive world knowledge and robust user behavior modeling, with potential for broader multimodal and instruction-tuned recommendation scenarios.

Abstract

Sequential recommendation aims to predict users' next interaction with items based on their past engagement sequence. Recently, the advent of Large Language Models (LLMs) has sparked interest in leveraging them for sequential recommendation, viewing it as language modeling. Previous studies represent items within LLMs' input prompts as either ID indices or textual metadata. However, these approaches often fail to either encapsulate comprehensive world knowledge or exhibit sufficient behavioral understanding. To combine the complementary strengths of conventional recommenders in capturing behavioral patterns of users and LLMs in encoding world knowledge about items, we introduce Large Language-Recommendation Assistant (LLaRA). Specifically, it uses a novel hybrid prompting method that integrates ID-based item embeddings learned by traditional recommendation models with textual item features. Treating the "sequential behaviors of users" as a distinct modality beyond texts, we employ a projector to align the traditional recommender's ID embeddings with the LLM's input space. Moreover, rather than directly exposing the hybrid prompt to LLMs, a curriculum learning strategy is adopted to gradually ramp up training complexity. Initially, we warm up the LLM using text-only prompts, which better suit its inherent language modeling ability. Subsequently, we progressively transition to the hybrid prompts, training the model to seamlessly incorporate the behavioral knowledge from the traditional sequential recommender into the LLM. Empirical results validate the effectiveness of our proposed framework. Codes are available at https://github.com/ljy0ustc/LLaRA.

LLaRA: Large Language-Recommendation Assistant

TL;DR

The paper tackles the challenge of sequential recommendation by leveraging the world knowledge and reasoning capabilities of large language models (LLMs) while preserving the strong behavioral modeling of traditional sequential recommenders. It introduces LLaRA, a framework that uses hybrid prompting to fuse textual item metadata with behavior-derived embeddings via a SR2LLM projector, and a curriculum prompt tuning scheme that warms up with text-only prompts before integrating hybrid prompts. The core contributions include (1) hybrid item representations that combine textual and behavioral information, (2) a modality-alignment approach that treats sequential behaviors as a distinct input modality for LLMs, and (3) curriculum-based training that improves stability and performance, validated by experiments showing superior HitRatio@1 across MovieLens, Steam, and LastFM compared with strong baselines. The work demonstrates a practical path toward unified recommender systems that capitalize on both extensive world knowledge and robust user behavior modeling, with potential for broader multimodal and instruction-tuned recommendation scenarios.

Abstract

Sequential recommendation aims to predict users' next interaction with items based on their past engagement sequence. Recently, the advent of Large Language Models (LLMs) has sparked interest in leveraging them for sequential recommendation, viewing it as language modeling. Previous studies represent items within LLMs' input prompts as either ID indices or textual metadata. However, these approaches often fail to either encapsulate comprehensive world knowledge or exhibit sufficient behavioral understanding. To combine the complementary strengths of conventional recommenders in capturing behavioral patterns of users and LLMs in encoding world knowledge about items, we introduce Large Language-Recommendation Assistant (LLaRA). Specifically, it uses a novel hybrid prompting method that integrates ID-based item embeddings learned by traditional recommendation models with textual item features. Treating the "sequential behaviors of users" as a distinct modality beyond texts, we employ a projector to align the traditional recommender's ID embeddings with the LLM's input space. Moreover, rather than directly exposing the hybrid prompt to LLMs, a curriculum learning strategy is adopted to gradually ramp up training complexity. Initially, we warm up the LLM using text-only prompts, which better suit its inherent language modeling ability. Subsequently, we progressively transition to the hybrid prompts, training the model to seamlessly incorporate the behavioral knowledge from the traditional sequential recommender into the LLM. Empirical results validate the effectiveness of our proposed framework. Codes are available at https://github.com/ljy0ustc/LLaRA.
Paper Structure (24 sections, 11 equations, 5 figures, 3 tables)

This paper contains 24 sections, 11 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Comparison among three prior item representation methods and ours. (a) ID Number: represents an item with a numerical index. (b) Randomly-initialized ID Token: represents an item with an OOV-independent token. (c) Text Metadata: represents an item with its textual features, such as item title. (d) Hybrid Item Representation: integrates both textual tokens and behavioral tokens derived from the ID-based item embedding learned by traditional recommender models.
  • Figure 2: The LLaRA framework. (a) Sequential recommendation data is transformed into the instruction-tuning format. The item representation example illustrates the transition from pure textual tokens to the integration of the textual tokens with the behavioral token. (b) The sequential recommender is well-trained and frozen, while the trainable projector bridges the sequential recommender and LLM space.
  • Figure 3: Illustration of text-only and hybrid prompting method. (a) Text-only prompting represents items with the combination of the textual token and a placeholder token. (b) Hybrid prompting represents items with the integration of the textual token and the behavioral token. Note that <PH> indicates a special placeholder token, reserved for substitution by the behavioral token $\text{<}emb_s^i\text{>}$ throughout the progressive learning procedure.
  • Figure 4: The performance comparison of different item representation methods (i.e., numerical index, behavioral token, textual feature, and hybrid representation). The hybrid representation is adopted in LLaRA.
  • Figure 5: Case studies. (a) The user prefers adventure and war genres according to the viewing history. With the world knowledge about these movies, TALLRec and LLaRA correctly recommend "The Great Escape". (b) SASRec and LLaRA recommend "Batman & Robin", according to the sequential behavioral patterns of users.