Towards a Unified Paradigm: Integrating Recommendation Systems as a New Language in Large Models
Kai Zheng, Qingfeng Sun, Can Xu, Peng Yu, Qingwei Guo
TL;DR
The paper proposes RSLLM, a unified paradigm that integrates traditional sequential recommender signals as a language for large language models. It introduces unified prompting that blends ID-based item embeddings with textual item features via a Text-ID prompting and a Hybrid Encoding with a trainable projector, enabling the LLM to process both semantic and collaborative information. A two-stage fine-tuning framework, combining text-only pretraining and target-domain fine-tuning with contrastive alignment (Next Item Prediction plus two InfoNCE-like losses), trains the model to fuse behavioral knowledge from recommenders into the LLM. Empirical results across MovieLens, Steam, and LastFM show RSLLM outperforms both traditional and other LLM-based sequence recommenders, with high validity and strong ablation evidence supporting the importance of multimodal item representations, cross-domain alignment, and staged training. This work suggests a practical path toward more accurate, knowledge-rich, and instruction-following sequential recommendations in large-scale language models, albeit with notable computational demands.
Abstract
This paper explores the use of Large Language Models (LLMs) for sequential recommendation, which predicts users' future interactions based on their past behavior. We introduce a new concept, "Integrating Recommendation Systems as a New Language in Large Models" (RSLLM), which combines the strengths of traditional recommenders and LLMs. RSLLM uses a unique prompting method that combines ID-based item embeddings from conventional recommendation models with textual item features. It treats users' sequential behaviors as a distinct language and aligns the ID embeddings with the LLM's input space using a projector. We also propose a two-stage LLM fine-tuning framework that refines a pretrained LLM using a combination of two contrastive losses and a language modeling loss. The LLM is first fine-tuned using text-only prompts, followed by target domain fine-tuning with unified prompts. This trains the model to incorporate behavioral knowledge from the traditional sequential recommender into the LLM. Our empirical results validate the effectiveness of our proposed framework.
