R1-Ranker: Teaching LLM Rankers to Reason

Tao Feng; Zhigang Hua; Zijie Lei; Yan Xie; Shuang Yang; Bo Long; Jiaxuan You

R1-Ranker: Teaching LLM Rankers to Reason

Tao Feng, Zhigang Hua, Zijie Lei, Yan Xie, Shuang Yang, Bo Long, Jiaxuan You

TL;DR

This paper tackles unifying diverse ranking tasks for LLM-based rankers by introducing R1-Ranker, a reasoning-driven reinforcement learning framework. It presents two designs: DRanker for full-ranking in one shot and IRanker for iterative exclusion to enable deeper reasoning with reduced output space. Across nine datasets spanning recommendation, routing, and passage ranking, IRanker-3B achieves state-of-the-art performance among general baselines and is competitive with domain-specific methods, including a 15.7% relative improvement on average; zero-shot experiments and reasoning traces further demonstrate transferability to other LLMs and out-of-domain tasks. The work suggests that a unified, reasoning-focused foundation can robustly tackle diverse ranking problems and lay groundwork for efficient, scalable LLM-based ranking systems.

Abstract

Large language models (LLMs) have recently shown strong reasoning abilities in domains like mathematics, coding, and scientific problem-solving, yet their potential for ranking tasks, where prime examples include retrieval, recommender systems, and LLM routing, remains underexplored. Ranking requires complex reasoning across heterogeneous candidates, but existing LLM-based rankers are often domain-specific, tied to fixed backbones, and lack iterative refinement, limiting their ability to fully exploit LLMs' reasoning potential. To address these challenges, we propose R1-Ranker, a reasoning-incentive framework built on reinforcement learning, with two complementary designs: DRanker, which generates full rankings in one shot, and IRanker, which decomposes ranking into an iterative elimination process with step-wise rewards to encourage deeper reasoning. We evaluate unified R1-Rankers on nine datasets spanning recommendation, routing, and passage ranking, showing that IRanker-3B consistently achieves state-of-the-art performance, surpasses larger 7B models on some tasks, and yields a 15.7% average relative improvement. Ablation and generalization experiments further confirm the critical role of reinforcement learning and iterative reasoning, with IRanker-3B improving zero-shot performance by over 9% on out-of-domain tasks and reasoning traces boosting other LLMs by up to 22.87%. These results demonstrate that unifying diverse ranking tasks with a single reasoning-driven foundation model is both effective and essential for advancing LLM reasoning in ranking scenarios.

R1-Ranker: Teaching LLM Rankers to Reason

TL;DR

Abstract

R1-Ranker: Teaching LLM Rankers to Reason

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)