Table of Contents
Fetching ...

KeyB2+: Summary-Augmented Block Selection for Scalable Long-Document Reranking with LLMs

Minghan Li, Eric Gaussier, Juntao Li, Guodong Zhou

TL;DR

The paper tackles the inefficiency and noise challenges of long-document reranking with decoder-based LLMs. It introduces KeyB2 and KeyB2+ to explicitly select and combine the most relevant blocks under a token budget, with KeyB2+ optionally appending a lightweight, query-agnostic summary to provide global context. Through attention analysis, the authors show that indiscriminate long-context processing dilutes relevance signals, motivating the block-selection strategy. Empirically, KeyB2 and KeyB2+ deliver strong improvements across four benchmarks, achieving a new state-of-the-art on TREC DL 2019 and forming a favorable Pareto frontier that markedly reduces latency. The approach offers a practical, modular path to scalable, effective LLM-based long-document reranking that can adapt to different selectors and languages.

Abstract

Large language models (LLMs) have advanced neural information retrieval (IR), yet applying them to long-document reranking remains computationally expensive and often ineffective when irrelevant content dominates. We begin with an in-depth analysis of decoder-only LLM attention and show that while some heads align with relevance signals, this alignment quickly deteriorates as irrelevant text accumulates. These observations highlight the necessity of explicit block selection to preserve focus and efficiency. We present KeyB2 and KeyB2+, a scalable reranking framework that selects and aggregates the most relevant blocks together with each document's summarization, ensuring that both localized evidence and global semantics are captured before LLM scoring. KeyB2 family support flexible selectors: BM25, bi-encoder, and cross-encoder, and adapts decoder-only LLMs to compute fine-grained relevance scores on the selected content. Experiments demonstrate that abstract-augmented block selection consistently improves retrieval effectiveness over strong baselines while substantially lowering inference cost, achieving new SOTA result on TREC DL 2019 document track (0.738 for NDCG@10). This establishes KeyB2+ as a practical and effective solution for scalable long-document reranking with LLMs.

KeyB2+: Summary-Augmented Block Selection for Scalable Long-Document Reranking with LLMs

TL;DR

The paper tackles the inefficiency and noise challenges of long-document reranking with decoder-based LLMs. It introduces KeyB2 and KeyB2+ to explicitly select and combine the most relevant blocks under a token budget, with KeyB2+ optionally appending a lightweight, query-agnostic summary to provide global context. Through attention analysis, the authors show that indiscriminate long-context processing dilutes relevance signals, motivating the block-selection strategy. Empirically, KeyB2 and KeyB2+ deliver strong improvements across four benchmarks, achieving a new state-of-the-art on TREC DL 2019 and forming a favorable Pareto frontier that markedly reduces latency. The approach offers a practical, modular path to scalable, effective LLM-based long-document reranking that can adapt to different selectors and languages.

Abstract

Large language models (LLMs) have advanced neural information retrieval (IR), yet applying them to long-document reranking remains computationally expensive and often ineffective when irrelevant content dominates. We begin with an in-depth analysis of decoder-only LLM attention and show that while some heads align with relevance signals, this alignment quickly deteriorates as irrelevant text accumulates. These observations highlight the necessity of explicit block selection to preserve focus and efficiency. We present KeyB2 and KeyB2+, a scalable reranking framework that selects and aggregates the most relevant blocks together with each document's summarization, ensuring that both localized evidence and global semantics are captured before LLM scoring. KeyB2 family support flexible selectors: BM25, bi-encoder, and cross-encoder, and adapts decoder-only LLMs to compute fine-grained relevance scores on the selected content. Experiments demonstrate that abstract-augmented block selection consistently improves retrieval effectiveness over strong baselines while substantially lowering inference cost, achieving new SOTA result on TREC DL 2019 document track (0.738 for NDCG@10). This establishes KeyB2+ as a practical and effective solution for scalable long-document reranking with LLMs.

Paper Structure

This paper contains 56 sections, 17 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Effectiveness (NDCG@10) vs. reranking latency (per 100 docs) on TREC DL’19. Top-left is better.
  • Figure 2: Doc $\!\rightarrow\!$ query attention mass (mean over heads/layers) on 500 pairs. Axes: layers (vertical) and heads (horizontal); (a) clean relevant documents; (b) with $L_{\text{noise}}{=}800$ tokens inserted before; (c) with $L_{\text{noise}}{=}800$ tokens inserted after. After-noise causes stronger dispersion (lower mass on query) than before-noise.
  • Figure 3: ARAS (Spearman $\rho$) and PCR (positive-rate of ARAS) under varying noise lengths/positions (averaged over 500 pairs). Noise consistently reduces alignment; inserting noise after the relevant content is more harmful than before.
  • Figure 4: The architecture of proposed KeyB2+ approach. If without summarization, it becomes KeyB2 approach
  • Figure 5: Example attention map of layer 31 head 24, that potentially is capturing the relevance information.