Reasoning to Rank: An End-to-End Solution for Exploiting Large Language Models for Recommendation

Kehan Zheng; Deyao Hong; Qian Li; Jun Zhang; Huan Yu; Jie Jiang; Hongning Wang

Reasoning to Rank: An End-to-End Solution for Exploiting Large Language Models for Recommendation

Kehan Zheng, Deyao Hong, Qian Li, Jun Zhang, Huan Yu, Jie Jiang, Hongning Wang

TL;DR

This work proposes Reasoning to Rank, an end-to-end training framework that internalizes recommendation utility optimization into the learning of step-by-step reasoning in LLMs and employs reinforcement learning for end-to-end training of the LLM.

Abstract

Recommender systems are tasked to infer users' evolving preferences and rank items aligned with their intents, which calls for in-depth reasoning beyond pattern-based scoring. Recent efforts start to leverage large language models (LLMs) for recommendation, but how to effectively optimize the model for improved recommendation utility is still under explored. In this work, we propose Reasoning to Rank, an end-to-end training framework that internalizes recommendation utility optimization into the learning of step-by-step reasoning in LLMs. To avoid position bias in LLM reasoning and enable direct optimization of the reasoning process, our framework performs reasoning at the user-item level and employs reinforcement learning for end-to-end training of the LLM. Experiments on three Amazon datasets and a large-scale industrial dataset showed consistent gains over strong conventional and LLM-based solutions. Extensive in-depth analyses validate the necessity of the key components in the proposed framework and shed lights on the future developments of this line of work.

Reasoning to Rank: An End-to-End Solution for Exploiting Large Language Models for Recommendation

TL;DR

Abstract

Paper Structure (26 sections, 8 equations, 6 figures, 7 tables, 1 algorithm)

This paper contains 26 sections, 8 equations, 6 figures, 7 tables, 1 algorithm.

Introduction
Related Work
Methodology
Preliminaries
Framework Overview
Reinforcement Learning under Plackett-Luce Surrogate
Self-reflective SFT Initialization
Experiments
Experimental Setup
Datasets and Metrics
Baselines
Backbone and Training Variants
Main Results
Ablation Study
Effect of Joint Training with the LLM
...and 11 more sections

Figures (6)

Figure 1: Overview of the proposed Reasoning to Rank framework. The LLM performs item-level reasoning for each candidate item and aggregates them to produce the final ranking. The reasoning process is trained end-to-end via reinforcement learning for directly optimizing the final recommendation utility.
Figure 2: The self-reflective prompt template adopted in R2Rank.
Figure 3: Comparison under item cold-start settings on Video Games and Industrial datasets.
Figure 4: Influence of user history length on recommendation quality.
Figure 5: Candidate position sensitivity of DeepSeek-R1 on the Musical Instruments dataset.
...and 1 more figures

Reasoning to Rank: An End-to-End Solution for Exploiting Large Language Models for Recommendation

TL;DR

Abstract

Reasoning to Rank: An End-to-End Solution for Exploiting Large Language Models for Recommendation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)