ChatGPT for Conversational Recommendation: Refining Recommendations by Reprompting with Feedback
Kyle Dylan Spurlock, Cagla Acun, Esin Saka, Olfa Nasraoui
TL;DR
This work investigates ChatGPT as a top-$n$ conversational recommender by building a rigorous reprompting pipeline that iteratively refines recommendations with user feedback. The methodology combines heterogeneous data, content-depth embeddings, and a structured evaluation framework with multiple prompts and feedback rounds. Key findings show that reprompting improves relevance and that popularity bias can be mitigated via prompt design and temperature control, with ChatGPT outperforming random baselines and approaching traditional baselines under certain conditions. The study demonstrates the practical potential of interactive LLM-based recommendations while highlighting limitations and directions for future comparisons with additional models and more current datasets.
Abstract
Recommendation algorithms have been pivotal in handling the overwhelming volume of online content. However, these algorithms seldom consider direct user input, resulting in superficial interaction between them. Efforts have been made to include the user directly in the recommendation process through conversation, but these systems too have had limited interactivity. Recently, Large Language Models (LLMs) like ChatGPT have gained popularity due to their ease of use and their ability to adapt dynamically to various tasks while responding to feedback. In this paper, we investigate the effectiveness of ChatGPT as a top-n conversational recommendation system. We build a rigorous pipeline around ChatGPT to simulate how a user might realistically probe the model for recommendations: by first instructing and then reprompting with feedback to refine a set of recommendations. We further explore the effect of popularity bias in ChatGPT's recommendations, and compare its performance to baseline models. We find that reprompting ChatGPT with feedback is an effective strategy to improve recommendation relevancy, and that popularity bias can be mitigated through prompt engineering.
