Table of Contents
Fetching ...

DeepRec: Towards a Deep Dive Into the Item Space with Large Language Model Based Recommendation

Bowen Zheng, Xiaolei Wang, Enze Liu, Xi Wang, Lu Hongyu, Yu Chen, Wayne Xin Zhao, Ji-Rong Wen

TL;DR

DeepRec introduces an autonomous multi-turn reasoning-retrieval framework that couples large language models with recommendation retrievers to deeply explore the item space. By employing a preference-aware TRM for data rollout, a hierarchical RL reward structure, and a two-stage RL training process, it significantly improves sequential recommendation performance over traditional and prior LLM-based baselines. The approach demonstrates strong empirical gains on public datasets and shows promising generalization to unseen domains, highlighting the potential of reasoning-enabled LLM agents that interact with external tools for adaptive, knowledge-rich recommender systems. This work paves the way for more intelligent, autonomous, and flexible recommendation pipelines that can exploit long-horizon reasoning and world knowledge without frequent LLM fine-tuning.

Abstract

Recently, large language models (LLMs) have been introduced into recommender systems (RSs), either to enhance traditional recommendation models (TRMs) or serve as recommendation backbones. However, existing LLM-based RSs often do not fully exploit the complementary advantages of LLMs (e.g., world knowledge and reasoning) and TRMs (e.g., recommendation-specific knowledge and efficiency) to fully explore the item space. To address this, we propose DeepRec, a novel LLM-based RS that enables autonomous multi-turn interactions between LLMs and TRMs for deep exploration of the item space. In each interaction turn, LLMs reason over user preferences and interact with TRMs to retrieve candidate items. After multi-turn interactions, LLMs rank the retrieved items to generate the final recommendations. We adopt reinforcement learning(RL) based optimization and propose novel designs from three aspects: recommendation model based data rollout, recommendation-oriented hierarchical rewards, and a two-stage RL training strategy. For data rollout, we introduce a preference-aware TRM, with which LLMs interact to construct trajectory data. For rewards, we design a hierarchical reward function that involves both process-level and outcome-level rewards to optimize the interaction process and recommendation performance, respectively. For RL training, we develop a two-stage training strategy, where the first stage aims to guide LLMs to interact with TRMs and the second stage focuses on performance improvement. Experiments on public datasets demonstrate that DeepRec significantly outperforms both traditional and LLM-based baselines, offering a new paradigm for deep exploration in recommendation systems.

DeepRec: Towards a Deep Dive Into the Item Space with Large Language Model Based Recommendation

TL;DR

DeepRec introduces an autonomous multi-turn reasoning-retrieval framework that couples large language models with recommendation retrievers to deeply explore the item space. By employing a preference-aware TRM for data rollout, a hierarchical RL reward structure, and a two-stage RL training process, it significantly improves sequential recommendation performance over traditional and prior LLM-based baselines. The approach demonstrates strong empirical gains on public datasets and shows promising generalization to unseen domains, highlighting the potential of reasoning-enabled LLM agents that interact with external tools for adaptive, knowledge-rich recommender systems. This work paves the way for more intelligent, autonomous, and flexible recommendation pipelines that can exploit long-horizon reasoning and world knowledge without frequent LLM fine-tuning.

Abstract

Recently, large language models (LLMs) have been introduced into recommender systems (RSs), either to enhance traditional recommendation models (TRMs) or serve as recommendation backbones. However, existing LLM-based RSs often do not fully exploit the complementary advantages of LLMs (e.g., world knowledge and reasoning) and TRMs (e.g., recommendation-specific knowledge and efficiency) to fully explore the item space. To address this, we propose DeepRec, a novel LLM-based RS that enables autonomous multi-turn interactions between LLMs and TRMs for deep exploration of the item space. In each interaction turn, LLMs reason over user preferences and interact with TRMs to retrieve candidate items. After multi-turn interactions, LLMs rank the retrieved items to generate the final recommendations. We adopt reinforcement learning(RL) based optimization and propose novel designs from three aspects: recommendation model based data rollout, recommendation-oriented hierarchical rewards, and a two-stage RL training strategy. For data rollout, we introduce a preference-aware TRM, with which LLMs interact to construct trajectory data. For rewards, we design a hierarchical reward function that involves both process-level and outcome-level rewards to optimize the interaction process and recommendation performance, respectively. For RL training, we develop a two-stage training strategy, where the first stage aims to guide LLMs to interact with TRMs and the second stage focuses on performance improvement. Experiments on public datasets demonstrate that DeepRec significantly outperforms both traditional and LLM-based baselines, offering a new paradigm for deep exploration in recommendation systems.

Paper Structure

This paper contains 23 sections, 14 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Comparison between existing LLM-based RSs and ours.
  • Figure 2: Overview of our approach DeepRec. $X$ is the user’s interaction history input to the LLM, and $Y$ is the final recommendation list output by the LLM. $t_i$ and $p_i$ denote the thought and the user preference generated by the LLM at the $i$-th turn, respectively. $I_i$ represents the items retrieved by the TRM based on $p_i$.
  • Figure 3: The performance impact of different rewards.
  • Figure 4: Performance w.r.t. initial invocation count.