Make Large Language Model a Better Ranker

Wen-Shuo Chao; Zhi Zheng; Hengshu Zhu; Hao Liu

Make Large Language Model a Better Ranker

Wen-Shuo Chao, Zhi Zheng, Hengshu Zhu, Hao Liu

TL;DR

The paper tackles the misalignment between large language model generation and ranking tasks in recommender systems by proposing ALRO, a framework that combines Soft Lambda Loss (SLL) and Permutation-Sensitive Loss (PSL) within a supervised fine-tuning regime. By casting ranking as a language-generation problem with explicit feedback, employing a differentiable ranking signal via soft-argmax, and mitigating input-order bias through PSL, ALRO achieves superior $NDCG@k$ performance over embedding-based and other LLM-based baselines across multiple datasets and backbone models. The approach demonstrates notable gains with larger models and offers efficiency advantages over bootstrapping strategies, suggesting practical viability for LLM-driven re-ranking. Overall, ALRO advances the integration of ranking objectives into LLMs, delivering improved recommendation quality while maintaining feasible inference costs and scalability considerations for real-world systems.

Abstract

Large Language Models (LLMs) demonstrate robust capabilities across various fields, leading to a paradigm shift in LLM-enhanced Recommender System (RS). Research to date focuses on point-wise and pair-wise recommendation paradigms, which are inefficient for LLM-based recommenders due to high computational costs. However, existing list-wise approaches also fall short in ranking tasks due to misalignment between ranking objectives and next-token prediction. Moreover, these LLM-based methods struggle to effectively address the order relation among candidates, particularly given the scale of ratings. To address these challenges, this paper introduces the large language model framework with Aligned Listwise Ranking Objectives (ALRO). ALRO is designed to bridge the gap between the capabilities of LLMs and the nuanced requirements of ranking tasks. Specifically, ALRO employs explicit feedback in a listwise manner by introducing soft lambda loss, a customized adaptation of lambda loss designed for optimizing order relations. This mechanism provides more accurate optimization goals, enhancing the ranking process. Additionally, ALRO incorporates a permutation-sensitive learning mechanism that addresses position bias, a prevalent issue in generative models, without imposing additional computational burdens during inference. Our evaluative studies reveal that ALRO outperforms both existing embedding-based recommendation methods and LLM-based recommendation baselines.

Make Large Language Model a Better Ranker

TL;DR

performance over embedding-based and other LLM-based baselines across multiple datasets and backbone models. The approach demonstrates notable gains with larger models and offers efficiency advantages over bootstrapping strategies, suggesting practical viability for LLM-driven re-ranking. Overall, ALRO advances the integration of ranking objectives into LLMs, delivering improved recommendation quality while maintaining feasible inference costs and scalability considerations for real-world systems.

Abstract

Paper Structure (26 sections, 9 equations, 4 figures, 5 tables)

This paper contains 26 sections, 9 equations, 4 figures, 5 tables.

Introduction
Related Works
Large Language Model for Recommendation
Learning to Rank
Problem Statement
Methodology
Template Design
Supervised Fine-Tuning
Soft Lambda Loss (SLL)
Permutation-Sensitive Loss (PSL)
Training Objective
Experiment
Dataset
Baselines and Evaluation Metrices
Implementation Details
...and 11 more sections

Figures (4)

Figure 1: The comparison of point-wise, pair-wise, and list-wise ranking in LLM-based recommendation.
Figure 2: Demonstration of position bias. The figure shows how the placement of candidate items in the input sequence can significantly alter the ranking results produced by a Language Model.
Figure 3: Ablation study on multiple datasets.
Figure 4: Enhancements achieved by ALRO across various model sizes on Movie dataset, measured using NDCG@10 metric.

Theorems & Definitions (1)

Definition 1

Make Large Language Model a Better Ranker

TL;DR

Abstract

Make Large Language Model a Better Ranker

Authors

TL;DR

Abstract

Table of Contents

Figures (4)

Theorems & Definitions (1)