Table of Contents
Fetching ...

MCRanker: Generating Diverse Criteria On-the-Fly to Improve Point-wise LLM Rankers

Fang Guo, Wenyu Li, Honglei Zhuang, Yun Luo, Yafu Li, Le Yan, Qi Zhu, Yue Zhang

TL;DR

This work tackles the inconsistency and incompleteness of zero-shot pointwise LLM ranking by introducing MCRanker, a framework that generates multi-perspective, query-centric criteria via a virtual annotation team (an NLP scientist plus recruited collaborators) and aggregates their assessments through score-based ensembling. By formalizing a four-step process—Team Recruiting, Criteria Generation, Passage Evaluation, and Score Ensemble—MCRanker produces more consistent relevance scores and improves ranking performance on eight BEIR datasets, with average NDCG@10 gains over strong baselines. The approach emphasizes the value of query-specific criteria and diverse perspectives, and it provides evidence that ensemble methods and higher rating scales can further strengthen performance, while also highlighting the framework’s robustness across LLM variants. Overall, the paper demonstrates that multi-perspective criterion generation can substantially improve pointwise LLM rankers and offers a foundation for extending this paradigm to broader ranking tasks.

Abstract

The most recent pointwise Large Language Model (LLM) rankers have achieved remarkable ranking results. However, these rankers are hindered by two major drawbacks: (1) they fail to follow a standardized comparison guidance during the ranking process, and (2) they struggle with comprehensive considerations when dealing with complicated passages. To address these shortcomings, we propose to build a ranker that generates ranking scores based on a set of criteria from various perspectives. These criteria are intended to direct each perspective in providing a distinct yet synergistic evaluation. Our research, which examines eight datasets from the BEIR benchmark demonstrates that incorporating this multi-perspective criteria ensemble approach markedly enhanced the performance of pointwise LLM rankers.

MCRanker: Generating Diverse Criteria On-the-Fly to Improve Point-wise LLM Rankers

TL;DR

This work tackles the inconsistency and incompleteness of zero-shot pointwise LLM ranking by introducing MCRanker, a framework that generates multi-perspective, query-centric criteria via a virtual annotation team (an NLP scientist plus recruited collaborators) and aggregates their assessments through score-based ensembling. By formalizing a four-step process—Team Recruiting, Criteria Generation, Passage Evaluation, and Score Ensemble—MCRanker produces more consistent relevance scores and improves ranking performance on eight BEIR datasets, with average NDCG@10 gains over strong baselines. The approach emphasizes the value of query-specific criteria and diverse perspectives, and it provides evidence that ensemble methods and higher rating scales can further strengthen performance, while also highlighting the framework’s robustness across LLM variants. Overall, the paper demonstrates that multi-perspective criterion generation can substantially improve pointwise LLM rankers and offers a foundation for extending this paradigm to broader ranking tasks.

Abstract

The most recent pointwise Large Language Model (LLM) rankers have achieved remarkable ranking results. However, these rankers are hindered by two major drawbacks: (1) they fail to follow a standardized comparison guidance during the ranking process, and (2) they struggle with comprehensive considerations when dealing with complicated passages. To address these shortcomings, we propose to build a ranker that generates ranking scores based on a set of criteria from various perspectives. These criteria are intended to direct each perspective in providing a distinct yet synergistic evaluation. Our research, which examines eight datasets from the BEIR benchmark demonstrates that incorporating this multi-perspective criteria ensemble approach markedly enhanced the performance of pointwise LLM rankers.
Paper Structure (29 sections, 4 equations, 7 figures, 5 tables)

This paper contains 29 sections, 4 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Pipeline of the proposed MCRanker in blue dashed line and the example output of the Pointwise LLM-based Ranker in orange straight line.
  • Figure 2: Research on different number of Team Member
  • Figure 3: MCRanker with different values of the rating scale k
  • Figure 4: Comparing performance of different base models
  • Figure 5: MCRanker spots a subtle semantic difference between the query and an irrelevant passage.
  • ...and 2 more figures