Table of Contents
Fetching ...

RecRanker: Instruction Tuning Large Language Model as Ranker for Top-k Recommendation

Sichun Luo, Bowei He, Haohan Zhao, Wei Shao, Yanlin Qi, Yinya Huang, Aojun Zhou, Yuxuan Yao, Zongpeng Li, Yuanzhang Xiao, Mingjie Zhan, Linqi Song

TL;DR

RecRanker tackles the mismatch between general-purpose LLMs and top- K recommender tasks by coupling instruction tuning with adaptive data collection, prompt enhancements, and a hybrid ranking that ensembles pointwise, pairwise, and listwise signals. It introduces three core components—adaptive user sampling, prompt augmentation with conventional recommender signals, and position-shifting to reduce bias—and trains an instruction-tuned LLM to act as the Ranker. Empirical results on three real-world datasets show meaningful gains over strong backbones, with the hybrid ranking delivering robust improvements across direct and sequential recommendations, and analysis confirms the value of larger models and more tuning data. The work demonstrates the practical viability of LLM-based ranking while identifying current computational bottlenecks and suggesting future directions such as distillation and model optimization for large-scale deployment.

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities and have been extensively deployed across various domains, including recommender systems. Prior research has employed specialized \textit{prompts} to leverage the in-context learning capabilities of LLMs for recommendation purposes. More recent studies have utilized instruction tuning techniques to align LLMs with human preferences, promising more effective recommendations. However, existing methods suffer from several limitations. The full potential of LLMs is not fully elicited due to low-quality tuning data and the overlooked integration of conventional recommender signals. Furthermore, LLMs may generate inconsistent responses for different ranking tasks in the recommendation, potentially leading to unreliable results. In this paper, we introduce \textbf{RecRanker}, tailored for instruction tuning LLMs to serve as the \textbf{Ranker} for top-\textit{k} \textbf{Rec}ommendations. Specifically, we introduce an adaptive sampling module for sampling high-quality, representative, and diverse training data. To enhance the prompt, we introduce a position shifting strategy to mitigate position bias and augment the prompt with auxiliary information from conventional recommendation models, thereby enriching the contextual understanding of the LLM. Subsequently, we utilize the sampled data to assemble an instruction-tuning dataset with the augmented prompts comprising three distinct ranking tasks: pointwise, pairwise, and listwise rankings. We further propose a hybrid ranking method to enhance the model performance by ensembling these ranking tasks. Our empirical evaluations demonstrate the effectiveness of our proposed RecRanker in both direct and sequential recommendation scenarios.

RecRanker: Instruction Tuning Large Language Model as Ranker for Top-k Recommendation

TL;DR

RecRanker tackles the mismatch between general-purpose LLMs and top- K recommender tasks by coupling instruction tuning with adaptive data collection, prompt enhancements, and a hybrid ranking that ensembles pointwise, pairwise, and listwise signals. It introduces three core components—adaptive user sampling, prompt augmentation with conventional recommender signals, and position-shifting to reduce bias—and trains an instruction-tuned LLM to act as the Ranker. Empirical results on three real-world datasets show meaningful gains over strong backbones, with the hybrid ranking delivering robust improvements across direct and sequential recommendations, and analysis confirms the value of larger models and more tuning data. The work demonstrates the practical viability of LLM-based ranking while identifying current computational bottlenecks and suggesting future directions such as distillation and model optimization for large-scale deployment.

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities and have been extensively deployed across various domains, including recommender systems. Prior research has employed specialized \textit{prompts} to leverage the in-context learning capabilities of LLMs for recommendation purposes. More recent studies have utilized instruction tuning techniques to align LLMs with human preferences, promising more effective recommendations. However, existing methods suffer from several limitations. The full potential of LLMs is not fully elicited due to low-quality tuning data and the overlooked integration of conventional recommender signals. Furthermore, LLMs may generate inconsistent responses for different ranking tasks in the recommendation, potentially leading to unreliable results. In this paper, we introduce \textbf{RecRanker}, tailored for instruction tuning LLMs to serve as the \textbf{Ranker} for top-\textit{k} \textbf{Rec}ommendations. Specifically, we introduce an adaptive sampling module for sampling high-quality, representative, and diverse training data. To enhance the prompt, we introduce a position shifting strategy to mitigate position bias and augment the prompt with auxiliary information from conventional recommendation models, thereby enriching the contextual understanding of the LLM. Subsequently, we utilize the sampled data to assemble an instruction-tuning dataset with the augmented prompts comprising three distinct ranking tasks: pointwise, pairwise, and listwise rankings. We further propose a hybrid ranking method to enhance the model performance by ensembling these ranking tasks. Our empirical evaluations demonstrate the effectiveness of our proposed RecRanker in both direct and sequential recommendation scenarios.
Paper Structure (40 sections, 10 equations, 8 figures, 7 tables, 1 algorithm)

This paper contains 40 sections, 10 equations, 8 figures, 7 tables, 1 algorithm.

Figures (8)

  • Figure 1: An example that illustrates the application of RecRanker for the top-k recommendation scenario.
  • Figure 2: (i). The overall training pipeline of RecRanker. (ii). Adaptive user sampling module, where we propose importance-aware sampling, clustering-based, and penalty for repetitive sampling to sample users. For each sampled user, corresponding candidate items are randomly selected from the items the user liked, disliked, and had no interaction with. (iii). Prompt construction, where we incorporate position shifting and prompt enhancement strategies to enhance the model performance.
  • Figure 3: (i). The overall inference pipeline of RecRanker. (ii). Candidate item selection via retrieval model, where we adopt the retrieval model to calculate the score for each item and select the highest ones as the candidate items. (iii). Comparison of the proposed hybrid ranking method with three ranking tasks during the inference stage.
  • Figure 4: Evaluation of the changes after reranking on ML-1M dataset with backbone model SGL. W2R: the wrong recommendation is changed to right. R2W: the right recommendation is altered to wrong. W2W: the wrong recommendation remains unchanged. R2R: the correct recommendation remains unchanged.
  • Figure 5: Evaluation of the changes after reranking on Bookcrossing dataset with backbone model SGL.
  • ...and 3 more figures