Table of Contents
Fetching ...

ReAttn: Improving Attention-based Re-ranking via Attention Re-weighting

Yuxing Tian, Fengran Mo, Weixu Zhang, Yiyan Qi, Jian-Yun Nie

TL;DR

ReAttn tackles two core weaknesses of attention-based re-ranking—signal concentration and lexical bias—by applying two post-hoc refinements to existing attention signals: cross-document IDF weighting to down-weight ubiquitous query tokens and entropy-based regularization to encourage broader, more balanced attention across documents. Implemented without any additional training, the method computes token-level weights $w(t)$ from document frequencies and derives final document scores via $B_i$, $p_{i,j}$, $E_i$, and $W_i$, yielding $s^{final}_{d_i}$. Across BEIR and long-context reasoning benchmarks, ReAttn yields consistent improvements when paired with ICR or QRhead over multiple base LLMs, approaching or surpassing some supervised baselines with low overhead. The results demonstrate practical impact for zero-shot, attention-based re-ranking in diverse domains and longer contexts, and point to broader applicability in IR tasks that suffer from attention skew and lexical bias. Future work may explore interactions with IR-tuned models, head-level attribution, and multilingual retrieval scenarios to further generalize the approach.

Abstract

The strong capabilities of recent Large Language Models (LLMs) have made them highly effective for zero-shot re-ranking task. Attention-based re-ranking methods, which derive relevance scores directly from attention weights, offer an efficient and interpretable alternative to generation-based re-ranking methods. However, they still face two major limitations. First, attention signals are highly concentrated a small subset of tokens within a few documents, making others indistinguishable. Second, attention often overemphasizes phrases lexically similar to the query, yielding biased rankings that irrelevant documents with mere lexical resemblance are regarded as relevant. In this paper, we propose \textbf{ReAttn}, a post-hoc re-weighting strategy for attention-based re-ranking methods. It first compute the cross-document IDF weighting to down-weight attention on query-overlapping tokens that frequently appear across the candidate documents, reducing lexical bias and emphasizing distinctive terms. It then employs entropy-based regularization to mitigate over-concentrated attention, encouraging a more balanced distribution across informative tokens. Both adjustments operate directly on existing attention weights without additional training or supervision. Extensive experiments demonstrate the effectiveness of our method.

ReAttn: Improving Attention-based Re-ranking via Attention Re-weighting

TL;DR

ReAttn tackles two core weaknesses of attention-based re-ranking—signal concentration and lexical bias—by applying two post-hoc refinements to existing attention signals: cross-document IDF weighting to down-weight ubiquitous query tokens and entropy-based regularization to encourage broader, more balanced attention across documents. Implemented without any additional training, the method computes token-level weights from document frequencies and derives final document scores via , , , and , yielding . Across BEIR and long-context reasoning benchmarks, ReAttn yields consistent improvements when paired with ICR or QRhead over multiple base LLMs, approaching or surpassing some supervised baselines with low overhead. The results demonstrate practical impact for zero-shot, attention-based re-ranking in diverse domains and longer contexts, and point to broader applicability in IR tasks that suffer from attention skew and lexical bias. Future work may explore interactions with IR-tuned models, head-level attribution, and multilingual retrieval scenarios to further generalize the approach.

Abstract

The strong capabilities of recent Large Language Models (LLMs) have made them highly effective for zero-shot re-ranking task. Attention-based re-ranking methods, which derive relevance scores directly from attention weights, offer an efficient and interpretable alternative to generation-based re-ranking methods. However, they still face two major limitations. First, attention signals are highly concentrated a small subset of tokens within a few documents, making others indistinguishable. Second, attention often overemphasizes phrases lexically similar to the query, yielding biased rankings that irrelevant documents with mere lexical resemblance are regarded as relevant. In this paper, we propose \textbf{ReAttn}, a post-hoc re-weighting strategy for attention-based re-ranking methods. It first compute the cross-document IDF weighting to down-weight attention on query-overlapping tokens that frequently appear across the candidate documents, reducing lexical bias and emphasizing distinctive terms. It then employs entropy-based regularization to mitigate over-concentrated attention, encouraging a more balanced distribution across informative tokens. Both adjustments operate directly on existing attention weights without additional training or supervision. Extensive experiments demonstrate the effectiveness of our method.
Paper Structure (19 sections, 13 equations, 3 figures, 6 tables)

This paper contains 19 sections, 13 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Illustration of two issues in attention-based re-ranking. (a) Signal concentration: the total attention mass is heavily concentrated on a few tokens of a few documents, leaving most candidates with negligible attention scores. (b) Lexical bias: attention disproportionately highlights tokens that are lexically similar to the query (e.g., producer, Jiang Wen), causing irrelevant documents with lexical overlap to the query to receive inflated relevance scores and misleading the ranking.
  • Figure 2: Prompt used for ICR, QRhead and our method.
  • Figure 3: Prompt used for rankGPT.