Table of Contents
Fetching ...

Do LLMs Understand Collaborative Signals? Diagnosis and Repair

Shahrooz Pouryousef, Ali Montazeralghaem

TL;DR

This work probes whether large language models can reason over collaborative signals in recommender contexts and compares them to matrix factorization. It introduces a retrieval-augmented generation framework that supplies LLMs with compact, structured signals derived from the ratings of top-$k$ similar users and explores four prompting strategies, including a reasoning-based approach. Results show that when the LLM is prompted with organized context and explicit reasoning directives, it can outperform MF, particularly for cold-start users, and that increasing contextual information generally boosts performance for reasoning prompts while remaining mindful of prompt length. The study highlights the importance of prompt design and selective information retrieval for scalable, LLM-based recommender systems, with practical implications for reducing latency and improving cold-start recommendations.

Abstract

Collaborative information from user-item interactions is a fundamental source of signal in successful recommender systems. Recently, researchers have attempted to incorporate this knowledge into large language model-based recommender approaches (LLMRec) to enhance their performance. However, there has been little fundamental analysis of whether LLMs can effectively reason over collaborative information. In this paper, we analyze the ability of LLMs to reason about collaborative information in recommendation tasks, comparing their performance to traditional matrix factorization (MF) models. We propose a simple and effective method to improve LLMs' reasoning capabilities using retrieval-augmented generation (RAG) over the user-item interaction matrix with four different prompting strategies. Our results show that the LLM outperforms the MF model whenever we provide relevant information in a clear and easy-to-follow format, and prompt the LLM to reason based on it. We observe that with this strategy, in almost all cases, the more information we provide, the better the LLM performs.

Do LLMs Understand Collaborative Signals? Diagnosis and Repair

TL;DR

This work probes whether large language models can reason over collaborative signals in recommender contexts and compares them to matrix factorization. It introduces a retrieval-augmented generation framework that supplies LLMs with compact, structured signals derived from the ratings of top- similar users and explores four prompting strategies, including a reasoning-based approach. Results show that when the LLM is prompted with organized context and explicit reasoning directives, it can outperform MF, particularly for cold-start users, and that increasing contextual information generally boosts performance for reasoning prompts while remaining mindful of prompt length. The study highlights the importance of prompt design and selective information retrieval for scalable, LLM-based recommender systems, with practical implications for reducing latency and improving cold-start recommendations.

Abstract

Collaborative information from user-item interactions is a fundamental source of signal in successful recommender systems. Recently, researchers have attempted to incorporate this knowledge into large language model-based recommender approaches (LLMRec) to enhance their performance. However, there has been little fundamental analysis of whether LLMs can effectively reason over collaborative information. In this paper, we analyze the ability of LLMs to reason about collaborative information in recommendation tasks, comparing their performance to traditional matrix factorization (MF) models. We propose a simple and effective method to improve LLMs' reasoning capabilities using retrieval-augmented generation (RAG) over the user-item interaction matrix with four different prompting strategies. Our results show that the LLM outperforms the MF model whenever we provide relevant information in a clear and easy-to-follow format, and prompt the LLM to reason based on it. We observe that with this strategy, in almost all cases, the more information we provide, the better the LLM performs.

Paper Structure

This paper contains 13 sections, 3 equations, 4 figures.

Figures (4)

  • Figure 1: Comparison of four prompt-generation strategies for movie recommendation based on retrieved user–movie similarities. Each method varies in how it incorporates similar users’ ratings, handles previously seen movies, and structures the prompt for downstream recommendation.
  • Figure 2: NDCG score for different prompt generation strategies and MF as a function of $k$ and $f$.
  • Figure 3: Hit@10 score for different prompt generation strategies and MF as a function of $k$ and $f$.
  • Figure 4: Processing time of prompts and the number of tokens in each prompt for hot and cold users.