Table of Contents
Fetching ...

STAR: A Simple Training-free Approach for Recommendations using Large Language Models

Dong-Ho Lee, Adam Kraft, Long Jin, Nikhil Mehta, Taibai Xu, Lichan Hong, Ed H. Chi, Xinyang Yi

TL;DR

STAR addresses the challenge of using LLMs for recommendations without fine-tuning. It introduces a two-stage framework: a Retrieval stage that scores candidate items by combining semantic similarity $R_S$ computed from LLM embeddings with collaborative signals $R_C$, using a temporal decay $\lambda$ and a weighting parameter $a$ in the rule $\mathrm{score}(x)=\frac{1}{n}\sum_{j=1}^{n} r_j \lambda^{t_j}\left[ a\,R_S^{xj}+(1-a)\,R_C^{xj} \right]$; and a Ranking stage where an LLM refines the top-$k$ candidates via point-wise, pair-wise, or list-wise strategies. On Amazon Beauty, Toys & Games, and Sports & Outdoors, the retrieval stage is competitive with supervised baselines and the full STAR pipeline yields substantial gains, illustrating that LLMs can function as generalist, training-free recommenders. The approach emphasizes that combining semantic embeddings with collaborative signals and temporal dynamics significantly enhances retrieval, while LLM-based ranking provides additional refinements with careful prompt design. This work demonstrates a practical, cross-domain, training-free path for deploying high-quality recommendations using LLMs, with implications for scalability and adaptability beyond domain-specific architectures.

Abstract

Recent progress in large language models (LLMs) offers promising new approaches for recommendation system tasks. While the current state-of-the-art methods rely on fine-tuning LLMs to achieve optimal results, this process is costly and introduces significant engineering complexities. Conversely, methods that directly use LLMs without additional fine-tuning result in a large drop in recommendation quality, often due to the inability to capture collaborative information. In this paper, we propose a Simple Training-free Approach for Recommendation (STAR), a framework that utilizes LLMs and can be applied to various recommendation tasks without the need for fine-tuning, while maintaining high quality recommendation performance. Our approach involves a retrieval stage that uses semantic embeddings from LLMs combined with collaborative user information to retrieve candidate items. We then apply an LLM for pairwise ranking to enhance next-item prediction. Experimental results on the Amazon Review dataset show competitive performance for next item prediction, even with our retrieval stage alone. Our full method achieves Hits@10 performance of +23.8% on Beauty, +37.5% on Toys & Games, and -1.8% on Sports & Outdoors relative to the best supervised models. This framework offers an effective alternative to traditional supervised models, highlighting the potential of LLMs in recommendation systems without extensive training or custom architectures.

STAR: A Simple Training-free Approach for Recommendations using Large Language Models

TL;DR

STAR addresses the challenge of using LLMs for recommendations without fine-tuning. It introduces a two-stage framework: a Retrieval stage that scores candidate items by combining semantic similarity computed from LLM embeddings with collaborative signals , using a temporal decay and a weighting parameter in the rule ; and a Ranking stage where an LLM refines the top- candidates via point-wise, pair-wise, or list-wise strategies. On Amazon Beauty, Toys & Games, and Sports & Outdoors, the retrieval stage is competitive with supervised baselines and the full STAR pipeline yields substantial gains, illustrating that LLMs can function as generalist, training-free recommenders. The approach emphasizes that combining semantic embeddings with collaborative signals and temporal dynamics significantly enhances retrieval, while LLM-based ranking provides additional refinements with careful prompt design. This work demonstrates a practical, cross-domain, training-free path for deploying high-quality recommendations using LLMs, with implications for scalability and adaptability beyond domain-specific architectures.

Abstract

Recent progress in large language models (LLMs) offers promising new approaches for recommendation system tasks. While the current state-of-the-art methods rely on fine-tuning LLMs to achieve optimal results, this process is costly and introduces significant engineering complexities. Conversely, methods that directly use LLMs without additional fine-tuning result in a large drop in recommendation quality, often due to the inability to capture collaborative information. In this paper, we propose a Simple Training-free Approach for Recommendation (STAR), a framework that utilizes LLMs and can be applied to various recommendation tasks without the need for fine-tuning, while maintaining high quality recommendation performance. Our approach involves a retrieval stage that uses semantic embeddings from LLMs combined with collaborative user information to retrieve candidate items. We then apply an LLM for pairwise ranking to enhance next-item prediction. Experimental results on the Amazon Review dataset show competitive performance for next item prediction, even with our retrieval stage alone. Our full method achieves Hits@10 performance of +23.8% on Beauty, +37.5% on Toys & Games, and -1.8% on Sports & Outdoors relative to the best supervised models. This framework offers an effective alternative to traditional supervised models, highlighting the potential of LLMs in recommendation systems without extensive training or custom architectures.

Paper Structure

This paper contains 42 sections, 1 equation, 6 figures, 8 tables.

Figures (6)

  • Figure 1: A Motivating Example. LLMs can be utilized in RecSys through (a) prompting, (b) fine-tuning on user-item interactions, and (c) using LLMs as feature encoders for training subsequent models. However, (a) cannot leverage collaborative knowledge, while (b) and (c) require extensive training and large-scale interaction data. Our framework STAR integrates collaborative knowledge into LLMs without additional training.
  • Figure 2: STAR Framework overview. We use the semantic relationship scores in $R_{\text{S}}$ and the collaborative relationship scores in $R_{\text{C}}$ to score the items in the user history compared to new items to recommend. The final score for one new item is a weighted average from the semantic relationship and collaborative relationship scores, with additional weights from the user's ratings $r$ and a temporal decay $\lambda<1$ which prioritize recent interactions. The top scoring retrieved items are sent to the LLM Ranking, where we can use point-wise, pair-wise, or list-wise ranking approaches to further improve upon the scoring of recommended items.
  • Figure 3: Prompt overview for the ranking pipeline. The prompt includes history items, candidate items, and instructions for the ranking strategy. Each item is represented by metadata, along with additional details such as popularity and co-occurrence, formatted in JSON. Full prompt is available in Appendix \ref{['app:ranking-prompt']}.
  • Figure 4: Retrieval performance (Hits@50) with different weighting factor $a$ between $R_{\text{S}}$ and $R_{\text{C}}$ (top), recency factor $\lambda$ (bottom-left), and number of history $l$ (bottom-right). The shaded regions show the best range. $a=0.5$, $\lambda=0.7$, and $l=3$ show the best.
  • Figure 5: Pair-wise ranking performance (Hits@10) trend by different number of history $l$ and number of candidates $k$
  • ...and 1 more figures