Table of Contents
Fetching ...

GroupRank: A Groupwise Reranking Paradigm Driven by Reinforcement Learning

Duolin Sun, Meixiu Long, Dan Yang, Yihan Jiao, Zhehao Tan, Jie Feng, Junjie Wang, Yue Shen, Peng Wei, Jian Wang, Jinjie Gu

TL;DR

GroupRank introduces a groupwise reranking paradigm to bridge the gap between pointwise and listwise methods in retrieval-augmented generation. It uses a two-stage training pipeline with supervised fine-tuning and heterogeneous reward-guided reinforcement learning (GRPO), plus a high-quality synthetic data generation pipeline that combines Pointwise and Listwise annotations to produce ground-truth scores. Empirical results on BRIGHT and R2MED show state-of-the-art performance at 7B and 32B scales, with competitive results on BEIR, and a favorable efficiency profile due to groupwise parallelism, achieving approximately $O(N/c)$ LLM calls. This work provides a scalable, flexible reranking framework that leverages LLM reasoning to improve complex, reasoning-intensive retrieval tasks.

Abstract

Large Language Models have shown strong potential as rerankers to enhance the overall performance of RAG systems. However, existing reranking paradigms are constrained by a core theoretical and practical dilemma: Pointwise methods, while simple and highly flexible, evaluate documents independently, making them prone to the Ranking Myopia Trap, overlooking the relative importance between documents. In contrast, Listwise methods can perceive the global ranking context, but suffer from inherent List Rigidity, leading to severe scalability and flexibility issues when handling large candidate sets. To address these challenges, we propose Groupwise, a novel reranking paradigm. In this approach, the query and a group of candidate documents are jointly fed into the model, which performs within-group comparisons to assign individual relevance scores to each document. This design retains the flexibility of Pointwise methods while enabling the comparative capability of Listwise methods. We further adopt GRPO for model training, equipped with a heterogeneous reward function that integrates ranking metrics with a distributional reward aimed at aligning score distributions across groups. To overcome the bottleneck caused by the scarcity of high quality labeled data, we further propose an innovative pipeline for synthesizing high quality retrieval and ranking data. The resulting data can be leveraged not only for training the reranker but also for training the retriever. Extensive experiments validate the effectiveness of our approach. On two reasoning intensive retrieval benchmarks, BRIGHT and R2MED.

GroupRank: A Groupwise Reranking Paradigm Driven by Reinforcement Learning

TL;DR

GroupRank introduces a groupwise reranking paradigm to bridge the gap between pointwise and listwise methods in retrieval-augmented generation. It uses a two-stage training pipeline with supervised fine-tuning and heterogeneous reward-guided reinforcement learning (GRPO), plus a high-quality synthetic data generation pipeline that combines Pointwise and Listwise annotations to produce ground-truth scores. Empirical results on BRIGHT and R2MED show state-of-the-art performance at 7B and 32B scales, with competitive results on BEIR, and a favorable efficiency profile due to groupwise parallelism, achieving approximately LLM calls. This work provides a scalable, flexible reranking framework that leverages LLM reasoning to improve complex, reasoning-intensive retrieval tasks.

Abstract

Large Language Models have shown strong potential as rerankers to enhance the overall performance of RAG systems. However, existing reranking paradigms are constrained by a core theoretical and practical dilemma: Pointwise methods, while simple and highly flexible, evaluate documents independently, making them prone to the Ranking Myopia Trap, overlooking the relative importance between documents. In contrast, Listwise methods can perceive the global ranking context, but suffer from inherent List Rigidity, leading to severe scalability and flexibility issues when handling large candidate sets. To address these challenges, we propose Groupwise, a novel reranking paradigm. In this approach, the query and a group of candidate documents are jointly fed into the model, which performs within-group comparisons to assign individual relevance scores to each document. This design retains the flexibility of Pointwise methods while enabling the comparative capability of Listwise methods. We further adopt GRPO for model training, equipped with a heterogeneous reward function that integrates ranking metrics with a distributional reward aimed at aligning score distributions across groups. To overcome the bottleneck caused by the scarcity of high quality labeled data, we further propose an innovative pipeline for synthesizing high quality retrieval and ranking data. The resulting data can be leveraged not only for training the reranker but also for training the retriever. Extensive experiments validate the effectiveness of our approach. On two reasoning intensive retrieval benchmarks, BRIGHT and R2MED.

Paper Structure

This paper contains 38 sections, 12 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: Comparison of reranking paradigms. For ranking $N$ documents given a query, three primary approaches are compared based on their complexity and efficiency. The Pointwise method operates without inter-document comparison, allowing for exact, concurrent solving with $O(N)$ complexity, though yielding poorer performance. The Listwise approach uses sliding windows for approximation, achieving better results due to comparison, but is sequential and non-concurrent with $O(N/w)$ complexity. Our Groupwise method enables document comparison and score output, offering high effectiveness. Crucially, it retains the benefit of exact solving and concurrency, achieving $O(N/c)$ complexity. Meanwhile, c denotes the number of documents compared per request, and w denotes the sliding window.
  • Figure 2: Workflow for High-Quality Training Data Generation. After filtering candidate documents via hybrid retrieval, we employ two parallel annotation methods: Pointwise (LLM-based individual scoring) and Listwise (LLM-based holistic ranking). Finally, we apply a weighted fusion to these two sets of annotations to generate highly reliable final scores and a ranked list. This output is ideal for training a GroupRank and can also be used for training Retrievers or other Rerankers.
  • Figure 3: The two-stage training paradigm for the Group Wise Reranker is designed to combine the flexibility of pointwise methods with the high performance of listwise approaches. Within the reinforcement learning (RL) component, the reward function incorporates not only format reward and recall reward, but also incorporates our novel ranking reward and distribution reward, specifically designed for the GroupRank.