Table of Contents
Fetching ...

Enhancing Rating Prediction with Off-the-Shelf LLMs Using In-Context User Reviews

Koki Ryu, Hitomi Yanaka

TL;DR

The paper investigates rating prediction for Likert-scale outputs using off-the-shelf LLMs by supplying in-context user reviews as preference data. It shows that per-item reviews in the prompt (RS → S) consistently improve Spearman correlation and RMSE across eight models and three datasets, achieving competitive results relative to traditional MF baselines and mitigating cold-start issues. A prompting strategy that first generates a hypothetical review (RS → RS) often yields further gains, especially for smaller models, though benefits on RMSE are mixed. The findings demonstrate the potential of lightweight, data-efficient personalization with off-the-shelf LLMs and highlight the value of item-specific textual evidence for user preferences in rating prediction.

Abstract

Personalizing the outputs of large language models (LLMs) to align with individual user preferences is an active research area. However, previous studies have mainly focused on classification or ranking tasks and have not considered Likert-scale rating prediction, a regression task that requires both language and mathematical reasoning to be solved effectively. This task has significant industrial applications, but the utilization of LLMs remains underexplored, particularly regarding the capabilities of off-the-shelf LLMs. This study investigates the performance of off-the-shelf LLMs on rating prediction, providing different in-context information. Through comprehensive experiments with eight models across three datasets, we demonstrate that user-written reviews significantly improve the rating prediction performance of LLMs. This result is comparable to traditional methods like matrix factorization, highlighting the potential of LLMs as a promising solution for the cold-start problem. We also find that the reviews for concrete items are more effective than general preference descriptions that are not based on any specific item. Furthermore, we discover that prompting LLMs to first generate a hypothetical review enhances the rating prediction performance. Our code is available at https://github.com/ynklab/rating-prediction-with-reviews.

Enhancing Rating Prediction with Off-the-Shelf LLMs Using In-Context User Reviews

TL;DR

The paper investigates rating prediction for Likert-scale outputs using off-the-shelf LLMs by supplying in-context user reviews as preference data. It shows that per-item reviews in the prompt (RS → S) consistently improve Spearman correlation and RMSE across eight models and three datasets, achieving competitive results relative to traditional MF baselines and mitigating cold-start issues. A prompting strategy that first generates a hypothetical review (RS → RS) often yields further gains, especially for smaller models, though benefits on RMSE are mixed. The findings demonstrate the potential of lightweight, data-efficient personalization with off-the-shelf LLMs and highlight the value of item-specific textual evidence for user preferences in rating prediction.

Abstract

Personalizing the outputs of large language models (LLMs) to align with individual user preferences is an active research area. However, previous studies have mainly focused on classification or ranking tasks and have not considered Likert-scale rating prediction, a regression task that requires both language and mathematical reasoning to be solved effectively. This task has significant industrial applications, but the utilization of LLMs remains underexplored, particularly regarding the capabilities of off-the-shelf LLMs. This study investigates the performance of off-the-shelf LLMs on rating prediction, providing different in-context information. Through comprehensive experiments with eight models across three datasets, we demonstrate that user-written reviews significantly improve the rating prediction performance of LLMs. This result is comparable to traditional methods like matrix factorization, highlighting the potential of LLMs as a promising solution for the cold-start problem. We also find that the reviews for concrete items are more effective than general preference descriptions that are not based on any specific item. Furthermore, we discover that prompting LLMs to first generate a hypothetical review enhances the rating prediction performance. Our code is available at https://github.com/ynklab/rating-prediction-with-reviews.

Paper Structure

This paper contains 62 sections, 2 equations, 24 figures, 8 tables.

Figures (24)

  • Figure 1: Illustration of the impact of in-context review data on LLM-based rating prediction performance. By leveraging rich, qualitative preference information from user reviews in the context, LLMs can more accurately infer a user’s preference for the target item, as demonstrated by the improved prediction from 1/5 to 5/5.
  • Figure 2: Average Spearman Correlation ($\uparrow$) (top) and RMSE ($\downarrow$) (bottom) with S $\to$ S and RS $\to$ S prompting. For the open-source models, error bars represent the standard deviation. RS $\to$ S format consistently improves the Spearman Correlation, while reduces the RMSE.
  • Figure 3: Comparison of extrapolation precision / recall on the Movies dataset. The models show reasonable precision, and the in-context review data improves the performance.
  • Figure 4: Comparison of Spearman Correlation on Movies (Similar) and Movies (Dissimilar). Although on the Similar subset the models perform better, in-context review texts improve the performance even on the dissimilar subset.
  • Figure 5: Comparison of rating prediction performance with and without the self-described preference generated by LLMs. The self-described preference does not work as effectively as the per-item reviews under the rating prediction settings.
  • ...and 19 more figures