KeyB2+: Summary-Augmented Block Selection for Scalable Long-Document Reranking with LLMs
Minghan Li, Eric Gaussier, Juntao Li, Guodong Zhou
TL;DR
The paper tackles the inefficiency and noise challenges of long-document reranking with decoder-based LLMs. It introduces KeyB2 and KeyB2+ to explicitly select and combine the most relevant blocks under a token budget, with KeyB2+ optionally appending a lightweight, query-agnostic summary to provide global context. Through attention analysis, the authors show that indiscriminate long-context processing dilutes relevance signals, motivating the block-selection strategy. Empirically, KeyB2 and KeyB2+ deliver strong improvements across four benchmarks, achieving a new state-of-the-art on TREC DL 2019 and forming a favorable Pareto frontier that markedly reduces latency. The approach offers a practical, modular path to scalable, effective LLM-based long-document reranking that can adapt to different selectors and languages.
Abstract
Large language models (LLMs) have advanced neural information retrieval (IR), yet applying them to long-document reranking remains computationally expensive and often ineffective when irrelevant content dominates. We begin with an in-depth analysis of decoder-only LLM attention and show that while some heads align with relevance signals, this alignment quickly deteriorates as irrelevant text accumulates. These observations highlight the necessity of explicit block selection to preserve focus and efficiency. We present KeyB2 and KeyB2+, a scalable reranking framework that selects and aggregates the most relevant blocks together with each document's summarization, ensuring that both localized evidence and global semantics are captured before LLM scoring. KeyB2 family support flexible selectors: BM25, bi-encoder, and cross-encoder, and adapts decoder-only LLMs to compute fine-grained relevance scores on the selected content. Experiments demonstrate that abstract-augmented block selection consistently improves retrieval effectiveness over strong baselines while substantially lowering inference cost, achieving new SOTA result on TREC DL 2019 document track (0.738 for NDCG@10). This establishes KeyB2+ as a practical and effective solution for scalable long-document reranking with LLMs.
