Generative Explore-Exploit: Training-free Optimization of Generative Recommender Systems using LLM Optimizers

Lütfi Kerem Senel; Besnik Fetahu; Davis Yoshida; Zhiyu Chen; Giuseppe Castellucci; Nikhita Vedula; Jason Choi; Shervin Malmasi

Generative Explore-Exploit: Training-free Optimization of Generative Recommender Systems using LLM Optimizers

Lütfi Kerem Senel, Besnik Fetahu, Davis Yoshida, Zhiyu Chen, Giuseppe Castellucci, Nikhita Vedula, Jason Choi, Shervin Malmasi

TL;DR

This work tackles training-free optimization of generative recommenders by linking user feedback to LLM-based optimizers. It introduces a Generative Explore-Exploit framework that uses in-context learning to refine a pool of generated items based on implicit CTR signals, without fine-tuning LLMs. An offline user simulator with diverse personas evaluates two update strategies, full-ctr and Explore-Exploit, across e-commerce and general knowledge domains, showing that exploration is crucial for discovering hidden preferences and that CTR-informed exploitation yields robust improvements. The approach scales to large item spaces and supports potential extensions to user-level personalization, while avoiding reward-models or fine-tuning. Overall, training-free optimization with CTR-driven feedback significantly improves relevance and engagement, suggesting practical applications for open-ended tasks like question generation and beyond.

Abstract

Recommender systems are widely used to suggest engaging content, and Large Language Models (LLMs) have given rise to generative recommenders. Such systems can directly generate items, including for open-set tasks like question suggestion. While the world knowledge of LLMs enable good recommendations, improving the generated content through user feedback is challenging as continuously fine-tuning LLMs is prohibitively expensive. We present a training-free approach for optimizing generative recommenders by connecting user feedback loops to LLM-based optimizers. We propose a generative explore-exploit method that can not only exploit generated items with known high engagement, but also actively explore and discover hidden population preferences to improve recommendation quality. We evaluate our approach on question generation in two domains (e-commerce and general knowledge), and model user feedback with Click Through Rate (CTR). Experiments show our LLM-based explore-exploit approach can iteratively improve recommendations, and consistently increase CTR. Ablation analysis shows that generative exploration is key to learning user preferences, avoiding the pitfalls of greedy exploit-only approaches. A human evaluation strongly supports our quantitative findings.

Generative Explore-Exploit: Training-free Optimization of Generative Recommender Systems using LLM Optimizers

TL;DR

Abstract

Paper Structure (44 sections, 3 equations, 13 figures, 7 tables)

This paper contains 44 sections, 3 equations, 13 figures, 7 tables.

Introduction
Related Work
Problem Definition
Generative Recommender Approach
full-ctr:
Explore-Exploit:
User Click Simulator
Relevance Scoring
Action Simulation
Experimental Setup
Data and Domains
User Personas
Approach Setup
LLM:
Approach Configurations
...and 29 more sections

Figures (13)

Figure 1: Overview of our generative recommender approach. It iteratively refines its item pool using feedback signals based on clicks to gradually improve the relevance of the questions to its user base.
Figure 2: Overview of our training-free generative recommendation approach. Our approach generates a question pool that has maximal relevance to its underlying user population base. Without any explicit signal on what the user's interests are, it exploits click through rate (CTR) of questions to iteratively refine what question shapes and about what aspects are generated. Initially, in the first iteration the questions are unlikely to be relevant to its user base, however, as CTR signal is gathered across multiple rounds of feedback iterations, our approach is able to progressively improve the question relevance.
Figure 3: Theoretical CTR values with $T=1.5$ for varying $RS$ and for 3 shown questions ($K=3$) with equal scores ranging from 1 to 10. The dashed vertical line ($RS=11$) shows the rejection score used in our experiments.
Figure 4: The plots on the left hand side show the average question scores, while the right hand side shows the CTR scores for the e-commerce domain for personas with 1 and 3 preferences. For personas with a single preference, the results are averaged across 5 different personas (see Figure \ref{['fig:shopping_single_preference']}.)
Figure 5: Average question scores and CTRs for the partial-ctr, CTR and Explore-Exploit methods on general knowledge domain for personas with a single preference.
...and 8 more figures

Generative Explore-Exploit: Training-free Optimization of Generative Recommender Systems using LLM Optimizers

TL;DR

Abstract

Generative Explore-Exploit: Training-free Optimization of Generative Recommender Systems using LLM Optimizers

Authors

TL;DR

Abstract

Table of Contents

Figures (13)