Table of Contents
Fetching ...

Sliding Windows Are Not the End: Exploring Full Ranking with Long-Context Large Language Models

Wenhan Liu, Xinyu Ma, Yutao Zhu, Ziliang Zhao, Shuaiqiang Wang, Dawei Yin, Zhicheng Dou

TL;DR

This work investigates the feasibility and efficiency of full ranking using long-context LLMs for listwise passage ranking, comparing it to the traditional sliding-window approach. It identifies two key limitations of applying existing listwise training to full ranking and proposes a multi-pass label construction plus an importance-aware loss to address them. In zero-shot settings, full ranking is more efficient but less effective, while supervised fine-tuning with a RankMistral_100 backbone yields strong improvements over the sliding-window baseline and reduces API costs by roughly half. The findings demonstrate that full ranking, when properly trained, offers substantial practical benefits for scalable retrieval with long-context LLMs, and provide concrete methods for label generation and objective design to enhance ranking performance.

Abstract

Large Language Models (LLMs) have shown exciting performance in listwise passage ranking. Due to the limited input length, existing methods often adopt the sliding window strategy. Such a strategy, though effective, is inefficient as it involves repetitive and serialized processing, which usually re-evaluates relevant passages multiple times. As a result, it incurs redundant API costs, which are proportional to the number of inference tokens. The development of long-context LLMs enables the full ranking of all passages within a single inference, avoiding redundant API costs. In this paper, we conduct a comprehensive study of long-context LLMs for ranking tasks in terms of efficiency and effectiveness. Surprisingly, our experiments reveal that full ranking with long-context LLMs can deliver superior performance in the supervised fine-tuning setting with a huge efficiency improvement. Furthermore, we identify two limitations of fine-tuning the full ranking model based on existing methods: (1) sliding window strategy fails to produce a full ranking list as a training label, and (2) the language modeling loss cannot emphasize top-ranked passage IDs in the label. To alleviate these issues, we propose a new complete listwise label construction approach and a novel importance-aware learning objective for full ranking. Experiments show the superior performance of our method over baselines. Our codes are available at \url{https://github.com/8421BCD/fullrank}.

Sliding Windows Are Not the End: Exploring Full Ranking with Long-Context Large Language Models

TL;DR

This work investigates the feasibility and efficiency of full ranking using long-context LLMs for listwise passage ranking, comparing it to the traditional sliding-window approach. It identifies two key limitations of applying existing listwise training to full ranking and proposes a multi-pass label construction plus an importance-aware loss to address them. In zero-shot settings, full ranking is more efficient but less effective, while supervised fine-tuning with a RankMistral_100 backbone yields strong improvements over the sliding-window baseline and reduces API costs by roughly half. The findings demonstrate that full ranking, when properly trained, offers substantial practical benefits for scalable retrieval with long-context LLMs, and provide concrete methods for label generation and objective design to enhance ranking performance.

Abstract

Large Language Models (LLMs) have shown exciting performance in listwise passage ranking. Due to the limited input length, existing methods often adopt the sliding window strategy. Such a strategy, though effective, is inefficient as it involves repetitive and serialized processing, which usually re-evaluates relevant passages multiple times. As a result, it incurs redundant API costs, which are proportional to the number of inference tokens. The development of long-context LLMs enables the full ranking of all passages within a single inference, avoiding redundant API costs. In this paper, we conduct a comprehensive study of long-context LLMs for ranking tasks in terms of efficiency and effectiveness. Surprisingly, our experiments reveal that full ranking with long-context LLMs can deliver superior performance in the supervised fine-tuning setting with a huge efficiency improvement. Furthermore, we identify two limitations of fine-tuning the full ranking model based on existing methods: (1) sliding window strategy fails to produce a full ranking list as a training label, and (2) the language modeling loss cannot emphasize top-ranked passage IDs in the label. To alleviate these issues, we propose a new complete listwise label construction approach and a novel importance-aware learning objective for full ranking. Experiments show the superior performance of our method over baselines. Our codes are available at \url{https://github.com/8421BCD/fullrank}.

Paper Structure

This paper contains 29 sections, 2 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: The sliding window strategy and full ranking strategy are shown in part (a) and part (b), respectively. The bar chart shows the comparison between our fine-tuned sliding window model and full ranking model in terms of NDCG@10 and latency (per query) on TREC DL19 dataset.
  • Figure 2: The training method of the full ranking model. We first use a multi-pass sliding window approach to iteratively obtain the full ranking list of passages. Then, we design an importance-aware loss that assigns different weights to the IDs in the label for model optimization.
  • Figure 3: Latency of ranking top-100 passages based on full ranking and sliding window strategy. "Output Top-10 ID" indicates that the LLM only generates the top-10 ranked passage IDs.
  • Figure 4: Comparison of sliding window strategy and full ranking strategy on DL19 dataset based on Mistral-7B-instruct-v0.3 and GPT-4o, respectively.
  • Figure 5: The comparison of API cost per query of sliding windows strategy and full ranking strategy when ranking top-100 retrieved passages.
  • ...and 1 more figures