Table of Contents
Fetching ...

Preference Discerning with LLM-Enhanced Generative Retrieval

Fabian Paischer, Liu Yang, Linfeng Liu, Shuai Shao, Kaveh Hassani, Jiacheng Li, Ricky Chen, Zhang Gabriel Li, Xiaoli Gao, Wei Shao, Xue Feng, Nima Noorshams, Sem Park, Bo Long, Hamid Eghbalzadeh

TL;DR

The paper introduces preference discerning, a paradigm that conditionally steers sequential recommendations using user preferences expressed in natural language. It decomposes the process into preference approximation via LLMs and preference conditioning within a multimodal generative retrieval model called Mender, which fuses semantic IDs with language-based context. A holistic benchmark across five steerability axes demonstrates that current generative retrievers struggle with dynamic adaptation, while Mender achieves state-of-the-art performance on several axes, including recommendation and fine-grained steering, especially when paired with larger language encoders. The work provides insights into how textual preferences can guide recommendations without retraining, highlights challenges in sentiment following, and outlines future avenues for scalable, language-aware, and neutral steering in recommender systems.

Abstract

In sequential recommendation, models recommend items based on user's interaction history. To this end, current models usually incorporate information such as item descriptions and user intent or preferences. User preferences are usually not explicitly given in open-source datasets, and thus need to be approximated, for example via large language models (LLMs). Current approaches leverage approximated user preferences only during training and rely solely on the past interaction history for recommendations, limiting their ability to dynamically adapt to changing preferences, potentially reinforcing echo chambers. To address this issue, we propose a new paradigm, namely preference discerning, which explicitly conditions a generative recommendation model on user preferences in natural language within its context. To evaluate preference discerning, we introduce a novel benchmark that provides a holistic evaluation across various scenarios, including preference steering and sentiment following. Upon evaluating current state-of-the-art methods on our benchmark, we discover that their ability to dynamically adapt to evolving user preferences is limited. To address this, we propose a new method named Mender ($\textbf{M}$ultimodal Prefer$\textbf{en}$ce $\textbf{D}$iscern$\textbf{er}$), which achieves state-of-the-art performance in our benchmark. Our results show that Mender effectively adapts its recommendation guided by human preferences, even if not observed during training, paving the way toward more flexible recommendation models.

Preference Discerning with LLM-Enhanced Generative Retrieval

TL;DR

The paper introduces preference discerning, a paradigm that conditionally steers sequential recommendations using user preferences expressed in natural language. It decomposes the process into preference approximation via LLMs and preference conditioning within a multimodal generative retrieval model called Mender, which fuses semantic IDs with language-based context. A holistic benchmark across five steerability axes demonstrates that current generative retrievers struggle with dynamic adaptation, while Mender achieves state-of-the-art performance on several axes, including recommendation and fine-grained steering, especially when paired with larger language encoders. The work provides insights into how textual preferences can guide recommendations without retraining, highlights challenges in sentiment following, and outlines future avenues for scalable, language-aware, and neutral steering in recommender systems.

Abstract

In sequential recommendation, models recommend items based on user's interaction history. To this end, current models usually incorporate information such as item descriptions and user intent or preferences. User preferences are usually not explicitly given in open-source datasets, and thus need to be approximated, for example via large language models (LLMs). Current approaches leverage approximated user preferences only during training and rely solely on the past interaction history for recommendations, limiting their ability to dynamically adapt to changing preferences, potentially reinforcing echo chambers. To address this issue, we propose a new paradigm, namely preference discerning, which explicitly conditions a generative recommendation model on user preferences in natural language within its context. To evaluate preference discerning, we introduce a novel benchmark that provides a holistic evaluation across various scenarios, including preference steering and sentiment following. Upon evaluating current state-of-the-art methods on our benchmark, we discover that their ability to dynamically adapt to evolving user preferences is limited. To address this, we propose a new method named Mender (ultimodal Preferce iscern), which achieves state-of-the-art performance in our benchmark. Our results show that Mender effectively adapts its recommendation guided by human preferences, even if not observed during training, paving the way toward more flexible recommendation models.

Paper Structure

This paper contains 32 sections, 4 equations, 19 figures, 9 tables, 1 algorithm.

Figures (19)

  • Figure 1: The preference discerning paradigm consists of two phases: preference approximation and preference conditioning. In preference approximation phase, a pre-trained LLM is used to infer user preferences from user-specific data. In preference conditioning phase, a sequential recommendation model is conditioned on the generated user preferences, enabling personalized recommendations.
  • Figure 2: Mender. The decoder generates semantic IDs conditioned on user preferences and interactions via cross-attention with a pre-trained language encoder.
  • Figure 3: Five evaluation axes for preference discerning we focus on in this work: Preference-based Recommendation, Fine-grained steering, Coarse-grained steering, Sentiment following, and History Consolidation. Preferences highlighted in green indicates that they are unseen during training.
  • Figure 4: Recall@10 for all methods on our novel benchmark, evaluating preference discerning across three subsets of the Amazon review dataset: Beauty (\ref{['Beauty']}), Sports and Outdoors (\ref{['Sports']}), and Toys and Games (\ref{['Toys']}). $\text{Mender}_{\text{Tok}}$ mostly outperforms generative retrieval competitors across Recommendation, Fine-grained steering and History consolidation. All methods struggle on Sentiment following and Coarse-grained steering.
  • Figure 5: Recall@10 of different baselines trained on the default recommendation data of the Steam dataset (\ref{['Steam']}) $\text{Mender}_{\text{Tok}}$ attains the highest performance on Recommendation, but all methods struggle on Steering and Sentiment following. \ref{['Beauty_alldata']}: Recall@10 for $\text{Mender}_{\text{Tok}}$ trained on different datasplits on the Amazon Beauty subset. $\text{Mender}_{\text{Tok}}\text{-All}$ leverages training data augmentation resulting in a universal model that performs well across all axes of preference discerning.
  • ...and 14 more figures