Table of Contents
Fetching ...

User Feedback Alignment for LLM-powered Exploration in Large-scale Recommendation Systems

Jianling Wang, Yifan Liu, Yinghao Sun, Xuejian Ma, Yueqi Wang, He Ma, Zhengyang Su, Minmin Chen, Mingyan Gao, Onkar Dalal, Ed H. Chi, Lichan Hong, Ningren Han, Haokai Lu

TL;DR

The paper addresses the challenge of exploring beyond established user preferences in large-scale recommendations by decoupling novelty generation from user preference alignment using two specialized LLMs. It introduces inference-time scaling to generate multiple novel candidates and uses a separately trained alignment model, trained on collective user feedback, to select the most relevant options, balancing novelty and relevance. Live experiments on a production platform show improvements in user satisfaction and exploration diversity, with higher playback completion, and increased UEUC, while pointwise labeling offers faster, robust performance. Overall, the approach provides a scalable, effective means to enhance exploration without sacrificing relevance in real-world, high-traffic systems.

Abstract

Exploration, the act of broadening user experiences beyond their established preferences, is challenging in large-scale recommendation systems due to feedback loops and limited signals on user exploration patterns. Large Language Models (LLMs) offer potential solutions by leveraging their world knowledge to recommend novel content outside these loops. A key challenge is aligning LLMs with user preferences while preserving their knowledge and reasoning. To enhance planning for new user interests using LLMs, this paper introduces a novel approach that combines hierarchical planning with LLM inference-time scaling. This method aims to improve recommendation relevancy without compromising novelty. We decouple novelty and user-alignment, training separate LLMs for each objective. We then scale up the novelty-focused LLM's inference and select the best-of-n predictions using the user-aligned LLM. Live experiments demonstrate efficacy, showing significant gains in both user satisfaction (measured by watch activity and active user counts) and exploration diversity.

User Feedback Alignment for LLM-powered Exploration in Large-scale Recommendation Systems

TL;DR

The paper addresses the challenge of exploring beyond established user preferences in large-scale recommendations by decoupling novelty generation from user preference alignment using two specialized LLMs. It introduces inference-time scaling to generate multiple novel candidates and uses a separately trained alignment model, trained on collective user feedback, to select the most relevant options, balancing novelty and relevance. Live experiments on a production platform show improvements in user satisfaction and exploration diversity, with higher playback completion, and increased UEUC, while pointwise labeling offers faster, robust performance. Overall, the approach provides a scalable, effective means to enhance exploration without sacrificing relevance in real-world, high-traffic systems.

Abstract

Exploration, the act of broadening user experiences beyond their established preferences, is challenging in large-scale recommendation systems due to feedback loops and limited signals on user exploration patterns. Large Language Models (LLMs) offer potential solutions by leveraging their world knowledge to recommend novel content outside these loops. A key challenge is aligning LLMs with user preferences while preserving their knowledge and reasoning. To enhance planning for new user interests using LLMs, this paper introduces a novel approach that combines hierarchical planning with LLM inference-time scaling. This method aims to improve recommendation relevancy without compromising novelty. We decouple novelty and user-alignment, training separate LLMs for each objective. We then scale up the novelty-focused LLM's inference and select the best-of-n predictions using the user-aligned LLM. Live experiments demonstrate efficacy, showing significant gains in both user satisfaction (measured by watch activity and active user counts) and exploration diversity.

Paper Structure

This paper contains 11 sections, 4 figures.

Figures (4)

  • Figure 1: Hierarchical planning paradigm: the novelty LLM performs high-level planning for novel interest transitions, which are used to restrict the predictions of classic recommender models, and user feedback on these novel recommendations is aggregated to train a separate alignment LLM.
  • Figure 2: The alignment model trained with collective user feedback can effectively predicts user preference over new labels.
  • Figure 3: Alignment Model Finetuning and Evaluation.
  • Figure 4: (a) The proposed method still recommends the highest percentage of novelty compared to the rest of the system. (b)(c)(d) Compared to the novelty model baseline, the alignment model further expands users' interest with higher user satisfaction.