Table of Contents
Fetching ...

LLMs as Better Recommenders with Natural Language Collaborative Signals: A Self-Assessing Retrieval Approach

Haoran Xin, Ying Sun, Chao Wang, Weijia Zhang, Hui Xiong

TL;DR

This paper addresses how to improve LLM-based recommendations by bridging the semantic gap between collaborative information and natural language. It introduces SCORE, a two-stage retrieve-rerank framework consisting of a Collaborative Retriever (CAR) that fuses collaborative and semantic signals and a Self-Assessing Reranker (SARE) that leverages LLM-driven self-assessment to prioritize signals, with the top retrieved user behaviors expressed in natural language and prepended to the LLM prompt. Experiments on MovieLens-1M and Games demonstrate that natural-language CI delivered via SCORE consistently enhances recommendation performance, offering CRM-agnostic flexibility and interpretability through attention visualizations. The work highlights the practical potential of integrating non-parametric, NL-form CI into LLM-based RS while acknowledging trade-offs in prompt length and representation richness, and outlines future work toward more compact NL representations and richer behavior modeling.

Abstract

Incorporating collaborative information (CI) effectively is crucial for leveraging LLMs in recommendation tasks. Existing approaches often encode CI using soft tokens or abstract identifiers, which introduces a semantic misalignment with the LLM's natural language pretraining and hampers knowledge integration. To address this, we propose expressing CI directly in natural language to better align with LLMs' semantic space. We achieve this by retrieving a curated set of the most relevant user behaviors in natural language form. However, identifying informative CI is challenging due to the complexity of similarity and utility assessment. To tackle this, we introduce a Self-assessing COllaborative REtrieval framework (SCORE) following the retrieve-rerank paradigm. First, a Collaborative Retriever (CAR) is developed to consider both collaborative patterns and semantic similarity. Then, a Self-assessing Reranker (SARE) leverages LLMs' own reasoning to assess and prioritize retrieved behaviors. Finally, the selected behaviors are prepended to the LLM prompt as natural-language CI to guide recommendation. Extensive experiments on two public datasets validate the effectiveness of SCORE in improving LLM-based recommendation.

LLMs as Better Recommenders with Natural Language Collaborative Signals: A Self-Assessing Retrieval Approach

TL;DR

This paper addresses how to improve LLM-based recommendations by bridging the semantic gap between collaborative information and natural language. It introduces SCORE, a two-stage retrieve-rerank framework consisting of a Collaborative Retriever (CAR) that fuses collaborative and semantic signals and a Self-Assessing Reranker (SARE) that leverages LLM-driven self-assessment to prioritize signals, with the top retrieved user behaviors expressed in natural language and prepended to the LLM prompt. Experiments on MovieLens-1M and Games demonstrate that natural-language CI delivered via SCORE consistently enhances recommendation performance, offering CRM-agnostic flexibility and interpretability through attention visualizations. The work highlights the practical potential of integrating non-parametric, NL-form CI into LLM-based RS while acknowledging trade-offs in prompt length and representation richness, and outlines future work toward more compact NL representations and richer behavior modeling.

Abstract

Incorporating collaborative information (CI) effectively is crucial for leveraging LLMs in recommendation tasks. Existing approaches often encode CI using soft tokens or abstract identifiers, which introduces a semantic misalignment with the LLM's natural language pretraining and hampers knowledge integration. To address this, we propose expressing CI directly in natural language to better align with LLMs' semantic space. We achieve this by retrieving a curated set of the most relevant user behaviors in natural language form. However, identifying informative CI is challenging due to the complexity of similarity and utility assessment. To tackle this, we introduce a Self-assessing COllaborative REtrieval framework (SCORE) following the retrieve-rerank paradigm. First, a Collaborative Retriever (CAR) is developed to consider both collaborative patterns and semantic similarity. Then, a Self-assessing Reranker (SARE) leverages LLMs' own reasoning to assess and prioritize retrieved behaviors. Finally, the selected behaviors are prepended to the LLM prompt as natural-language CI to guide recommendation. Extensive experiments on two public datasets validate the effectiveness of SCORE in improving LLM-based recommendation.

Paper Structure

This paper contains 22 sections, 14 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: The use of soft tokens or less meaningful identifiers as CI introduces a semantic gap that impairs LLM understanding. In contrast, LLMs could more effectively comprehend CI conveyed in natural language.
  • Figure 2: (a) Overview of the SCoRe framework. A two-stage fine-tuning paradigm is used to develop the collaborative retriever (CAR) and the self-assessing reranker (SARE). (b) Illustration of the self-assessing ranking process. The LLM evaluates the characteristics of beneficial CI for recommendation, which guides the reranking of retrieved similar users. (c) User behaviors are retrieved and reranked by CAR and SARE, then prepended to the prompt in natural language to enhance LLM-based recommendations.
  • Figure 3: Recommendation performance of different variants of SCoRe.
  • Figure 4: Performance variation with changing the number of retrieved users $K_e$ and final set of users $K_s$.
  • Figure 5: Attention visualization of the input prompt.
  • ...and 4 more figures