AttentionRetriever: Attention Layers are Secretly Long Document Retrievers

David Jiahao Fu; Lam Thanh Do; Jiayu Li; Kevin Chen-Chuan Chang

AttentionRetriever: Attention Layers are Secretly Long Document Retrievers

David Jiahao Fu, Lam Thanh Do, Jiayu Li, Kevin Chen-Chuan Chang

TL;DR

AttentionRetriever presents a training-free, context-aware long-document retrieval approach that leverages attention maps from pretrained LLMs and an entity-based retrieval mechanism to build embeddings and determine the retrieval scope. By combining attention-based sentence scoring with multi-view sentence embeddings and an entity graph for scope control, it addresses contextual, causal, and query dependencies that challenge traditional retrievers on long texts. The method is evaluated on a new LongBench-v2-Retrieval dataset and other benchmarks, showing substantial gains over sparse and dense baselines while maintaining efficiency comparable to dense models; ablations confirm the effectiveness of each component. These results suggest a practical path to efficient long-document QA and retrieval in RAG systems without additional training. Limitations include reliance on relatively large LLMs and the need for larger-scale datasets to fully assess generalization and scalability.

Abstract

Retrieval augmented generation (RAG) has been widely adopted to help Large Language Models (LLMs) to process tasks involving long documents. However, existing retrieval models are not designed for long document retrieval and fail to address several key challenges of long document retrieval, including context-awareness, causal dependence, and scope of retrieval. In this paper, we proposed AttentionRetriever, a novel long document retrieval model that leverages attention mechanism and entity-based retrieval to build context-aware embeddings for long document and determine the scope of retrieval. With extensive experiments, we found AttentionRetriever is able to outperform existing retrieval models on long document retrieval datasets by a large margin while remaining as efficient as dense retrieval models.

AttentionRetriever: Attention Layers are Secretly Long Document Retrievers

TL;DR

Abstract

Paper Structure (25 sections, 3 equations, 12 figures, 7 tables)

This paper contains 25 sections, 3 equations, 12 figures, 7 tables.

Introduction
Related Works
Long Document Retrieval
Context Window Length Extension
Attention Mechanism Interpretation
Observations
Method
Overview
Attention for Sentence Scoring
Sentence Embedding for Multi-view Similarity Search
Entity-based Retrieval
Dataset Construction
Experiments
Experimental Setup
Main Results
...and 10 more sections

Figures (12)

Figure 1: Overview of AttentionRetriever.
Figure 2: Results of attention analysis.
Figure 3: Test results on the needle-in-a-haystack test.
Figure 4: Average ranks of the gold paragraph for each subquery over all queries in the dataset for LLaMA-3.2 3B. The layer achieving the highest average rank for each subquery is marked at the top of each line.
Figure 5: The quartic approximation of the average ranks in Figure \ref{['fig:a-llama']}.
...and 7 more figures

AttentionRetriever: Attention Layers are Secretly Long Document Retrievers

TL;DR

Abstract

AttentionRetriever: Attention Layers are Secretly Long Document Retrievers

Authors

TL;DR

Abstract

Table of Contents

Figures (12)