Table of Contents
Fetching ...

AttentionRetriever: Attention Layers are Secretly Long Document Retrievers

David Jiahao Fu, Lam Thanh Do, Jiayu Li, Kevin Chen-Chuan Chang

TL;DR

AttentionRetriever presents a training-free, context-aware long-document retrieval approach that leverages attention maps from pretrained LLMs and an entity-based retrieval mechanism to build embeddings and determine the retrieval scope. By combining attention-based sentence scoring with multi-view sentence embeddings and an entity graph for scope control, it addresses contextual, causal, and query dependencies that challenge traditional retrievers on long texts. The method is evaluated on a new LongBench-v2-Retrieval dataset and other benchmarks, showing substantial gains over sparse and dense baselines while maintaining efficiency comparable to dense models; ablations confirm the effectiveness of each component. These results suggest a practical path to efficient long-document QA and retrieval in RAG systems without additional training. Limitations include reliance on relatively large LLMs and the need for larger-scale datasets to fully assess generalization and scalability.

Abstract

Retrieval augmented generation (RAG) has been widely adopted to help Large Language Models (LLMs) to process tasks involving long documents. However, existing retrieval models are not designed for long document retrieval and fail to address several key challenges of long document retrieval, including context-awareness, causal dependence, and scope of retrieval. In this paper, we proposed AttentionRetriever, a novel long document retrieval model that leverages attention mechanism and entity-based retrieval to build context-aware embeddings for long document and determine the scope of retrieval. With extensive experiments, we found AttentionRetriever is able to outperform existing retrieval models on long document retrieval datasets by a large margin while remaining as efficient as dense retrieval models.

AttentionRetriever: Attention Layers are Secretly Long Document Retrievers

TL;DR

AttentionRetriever presents a training-free, context-aware long-document retrieval approach that leverages attention maps from pretrained LLMs and an entity-based retrieval mechanism to build embeddings and determine the retrieval scope. By combining attention-based sentence scoring with multi-view sentence embeddings and an entity graph for scope control, it addresses contextual, causal, and query dependencies that challenge traditional retrievers on long texts. The method is evaluated on a new LongBench-v2-Retrieval dataset and other benchmarks, showing substantial gains over sparse and dense baselines while maintaining efficiency comparable to dense models; ablations confirm the effectiveness of each component. These results suggest a practical path to efficient long-document QA and retrieval in RAG systems without additional training. Limitations include reliance on relatively large LLMs and the need for larger-scale datasets to fully assess generalization and scalability.

Abstract

Retrieval augmented generation (RAG) has been widely adopted to help Large Language Models (LLMs) to process tasks involving long documents. However, existing retrieval models are not designed for long document retrieval and fail to address several key challenges of long document retrieval, including context-awareness, causal dependence, and scope of retrieval. In this paper, we proposed AttentionRetriever, a novel long document retrieval model that leverages attention mechanism and entity-based retrieval to build context-aware embeddings for long document and determine the scope of retrieval. With extensive experiments, we found AttentionRetriever is able to outperform existing retrieval models on long document retrieval datasets by a large margin while remaining as efficient as dense retrieval models.
Paper Structure (25 sections, 3 equations, 12 figures, 7 tables)

This paper contains 25 sections, 3 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: Overview of AttentionRetriever.
  • Figure 2: Results of attention analysis.
  • Figure 3: Test results on the needle-in-a-haystack test.
  • Figure 4: Average ranks of the gold paragraph for each subquery over all queries in the dataset for LLaMA-3.2 3B. The layer achieving the highest average rank for each subquery is marked at the top of each line.
  • Figure 5: The quartic approximation of the average ranks in Figure \ref{['fig:a-llama']}.
  • ...and 7 more figures