LLMs as Better Recommenders with Natural Language Collaborative Signals: A Self-Assessing Retrieval Approach
Haoran Xin, Ying Sun, Chao Wang, Weijia Zhang, Hui Xiong
TL;DR
This paper addresses how to improve LLM-based recommendations by bridging the semantic gap between collaborative information and natural language. It introduces SCORE, a two-stage retrieve-rerank framework consisting of a Collaborative Retriever (CAR) that fuses collaborative and semantic signals and a Self-Assessing Reranker (SARE) that leverages LLM-driven self-assessment to prioritize signals, with the top retrieved user behaviors expressed in natural language and prepended to the LLM prompt. Experiments on MovieLens-1M and Games demonstrate that natural-language CI delivered via SCORE consistently enhances recommendation performance, offering CRM-agnostic flexibility and interpretability through attention visualizations. The work highlights the practical potential of integrating non-parametric, NL-form CI into LLM-based RS while acknowledging trade-offs in prompt length and representation richness, and outlines future work toward more compact NL representations and richer behavior modeling.
Abstract
Incorporating collaborative information (CI) effectively is crucial for leveraging LLMs in recommendation tasks. Existing approaches often encode CI using soft tokens or abstract identifiers, which introduces a semantic misalignment with the LLM's natural language pretraining and hampers knowledge integration. To address this, we propose expressing CI directly in natural language to better align with LLMs' semantic space. We achieve this by retrieving a curated set of the most relevant user behaviors in natural language form. However, identifying informative CI is challenging due to the complexity of similarity and utility assessment. To tackle this, we introduce a Self-assessing COllaborative REtrieval framework (SCORE) following the retrieve-rerank paradigm. First, a Collaborative Retriever (CAR) is developed to consider both collaborative patterns and semantic similarity. Then, a Self-assessing Reranker (SARE) leverages LLMs' own reasoning to assess and prioritize retrieved behaviors. Finally, the selected behaviors are prepended to the LLM prompt as natural-language CI to guide recommendation. Extensive experiments on two public datasets validate the effectiveness of SCORE in improving LLM-based recommendation.
