ChatGPT for Conversational Recommendation: Refining Recommendations by Reprompting with Feedback

Kyle Dylan Spurlock; Cagla Acun; Esin Saka; Olfa Nasraoui

ChatGPT for Conversational Recommendation: Refining Recommendations by Reprompting with Feedback

Kyle Dylan Spurlock, Cagla Acun, Esin Saka, Olfa Nasraoui

TL;DR

This work investigates ChatGPT as a top-$n$ conversational recommender by building a rigorous reprompting pipeline that iteratively refines recommendations with user feedback. The methodology combines heterogeneous data, content-depth embeddings, and a structured evaluation framework with multiple prompts and feedback rounds. Key findings show that reprompting improves relevance and that popularity bias can be mitigated via prompt design and temperature control, with ChatGPT outperforming random baselines and approaching traditional baselines under certain conditions. The study demonstrates the practical potential of interactive LLM-based recommendations while highlighting limitations and directions for future comparisons with additional models and more current datasets.

Abstract

Recommendation algorithms have been pivotal in handling the overwhelming volume of online content. However, these algorithms seldom consider direct user input, resulting in superficial interaction between them. Efforts have been made to include the user directly in the recommendation process through conversation, but these systems too have had limited interactivity. Recently, Large Language Models (LLMs) like ChatGPT have gained popularity due to their ease of use and their ability to adapt dynamically to various tasks while responding to feedback. In this paper, we investigate the effectiveness of ChatGPT as a top-n conversational recommendation system. We build a rigorous pipeline around ChatGPT to simulate how a user might realistically probe the model for recommendations: by first instructing and then reprompting with feedback to refine a set of recommendations. We further explore the effect of popularity bias in ChatGPT's recommendations, and compare its performance to baseline models. We find that reprompting ChatGPT with feedback is an effective strategy to improve recommendation relevancy, and that popularity bias can be mitigated through prompt engineering.

ChatGPT for Conversational Recommendation: Refining Recommendations by Reprompting with Feedback

TL;DR

This work investigates ChatGPT as a top-

conversational recommender by building a rigorous reprompting pipeline that iteratively refines recommendations with user feedback. The methodology combines heterogeneous data, content-depth embeddings, and a structured evaluation framework with multiple prompts and feedback rounds. Key findings show that reprompting improves relevance and that popularity bias can be mitigated via prompt design and temperature control, with ChatGPT outperforming random baselines and approaching traditional baselines under certain conditions. The study demonstrates the practical potential of interactive LLM-based recommendations while highlighting limitations and directions for future comparisons with additional models and more current datasets.

Abstract

Paper Structure (27 sections, 8 equations, 8 figures, 6 tables)

This paper contains 27 sections, 8 equations, 8 figures, 6 tables.

Introduction
Related Work
Prompt Engineering
Language Models as Recommenders
Algorithmic Recourse
METHODOLOGY
Data
Content Analysis
User and Item selection
Initial Prompt Creation
Recommendation, Extraction, and Mapping
Relevancy Matching
Reprompting with Feedback
Evaluation of Recommendations
System Parameterization
...and 12 more sections

Figures (8)

Figure 1: Proposed pipeline for evaluating the effect of conversation in recommendation. P=number of prompts, p=prompt number. Each section corresponds to the section of the same name in the methodology.
Figure 2: Cumulative distribution functions of item pairwise cosine similarity. Levels are based on the amount of content contained in the sentence embeddings produced by text-davinci-002. The level 4 content level contains the same content as level 3, but with stop words and and the top 5% most common word-level tokens removed before embedding.
Figure 3: Coverage and precision distributions for different prompt numbers using best configuration from Table \ref{['tab:iterative-comparison']}. Plots show that ChatGPT continues to match unique items in the feedback set while further increasing precision.
Figure 4: Recommendation frequency by item and and density of two ChatGPT configurations. The model used has parameters $p=5$, $k=10$, $prompt\_style=$'zero.' Prompting with $temperature=1$ and specifying that popular recommendations should be limited ($prompt\_popular$=no) reduces the short-tail of recommendation frequency.
Figure B.1: Initial prompt choices. Parameter injection is in bold and contained in '{-}' but is represented as only the value in the actual text.
...and 3 more figures

ChatGPT for Conversational Recommendation: Refining Recommendations by Reprompting with Feedback

TL;DR

Abstract

ChatGPT for Conversational Recommendation: Refining Recommendations by Reprompting with Feedback

Authors

TL;DR

Abstract

Table of Contents

Figures (8)