Table of Contents
Fetching ...

Large Language Models as Narrative-Driven Recommenders

Lukas Eberhard, Thorsten Ruprechter, Denis Helic

TL;DR

The ability of LLMs to generate contextually relevant movie recommendations, significantly outperforming other state-of-the-art approaches, such as doc2vec and GPT-4o is demonstrated.

Abstract

Narrative-driven recommenders aim to provide personalized suggestions for user requests expressed in free-form text such as "I want to watch a thriller with a mind-bending story, like Shutter Island." Although large language models (LLMs) have been shown to excel in processing general natural language queries, their effectiveness for handling such recommendation requests remains relatively unexplored. To close this gap, we compare the performance of 38 open- and closed-source LLMs of various sizes, such as LLama 3.2 and GPT-4o, in a movie recommendation setting. For this, we utilize a gold-standard, crowdworker-annotated dataset of posts from reddit's movie suggestion community and employ various prompting strategies, including zero-shot, identity, and few-shot prompting. Our findings demonstrate the ability of LLMs to generate contextually relevant movie recommendations, significantly outperforming other state-of-the-art approaches, such as doc2vec. While we find that closed-source and large-parameterized models generally perform best, medium-sized open-source models remain competitive, being only slightly outperformed by their more computationally expensive counterparts. Furthermore, we observe no significant differences across prompting strategies for most models, underscoring the effectiveness of simple approaches such as zero-shot prompting for narrative-driven recommendations. Overall, this work offers valuable insights for recommender system researchers as well as practitioners aiming to integrate LLMs into real-world recommendation tools.

Large Language Models as Narrative-Driven Recommenders

TL;DR

The ability of LLMs to generate contextually relevant movie recommendations, significantly outperforming other state-of-the-art approaches, such as doc2vec and GPT-4o is demonstrated.

Abstract

Narrative-driven recommenders aim to provide personalized suggestions for user requests expressed in free-form text such as "I want to watch a thriller with a mind-bending story, like Shutter Island." Although large language models (LLMs) have been shown to excel in processing general natural language queries, their effectiveness for handling such recommendation requests remains relatively unexplored. To close this gap, we compare the performance of 38 open- and closed-source LLMs of various sizes, such as LLama 3.2 and GPT-4o, in a movie recommendation setting. For this, we utilize a gold-standard, crowdworker-annotated dataset of posts from reddit's movie suggestion community and employ various prompting strategies, including zero-shot, identity, and few-shot prompting. Our findings demonstrate the ability of LLMs to generate contextually relevant movie recommendations, significantly outperforming other state-of-the-art approaches, such as doc2vec. While we find that closed-source and large-parameterized models generally perform best, medium-sized open-source models remain competitive, being only slightly outperformed by their more computationally expensive counterparts. Furthermore, we observe no significant differences across prompting strategies for most models, underscoring the effectiveness of simple approaches such as zero-shot prompting for narrative-driven recommendations. Overall, this work offers valuable insights for recommender system researchers as well as practitioners aiming to integrate LLMs into real-world recommendation tools.

Paper Structure

This paper contains 13 sections, 35 figures, 2 tables.

Figures (35)

  • Figure 1: Evaluation of Movie Recommendations Using LLMs and Reddit Submissions. We assess LLM movie recommendation by combining different prompt templates (left) into zero-shot (Task), identity (Persona and Task), and few-shot (Task and Examples) prompting. We utilize these different prompts* to produce suggestions for real movie requests submitted by reddit users (right) by generating these recommendations with various LLMs (center). The LLM-generated recommendations (bottom center, dashed box) are compared to actual responses from reddit's community (right, dashed boxes) to evaluate the performance of LLMs as narrative-driven movie recommenders.
  • Figure 2: Valid JSON Format
  • Figure 3: 10 Recommendations
  • Figure 4: Unique Movies
  • Figure 5: Movies Released Before Request
  • ...and 30 more figures