Attention Instruction: Amplifying Attention in the Middle via Prompting

Meiru Zhang; Zaiqiao Meng; Nigel Collier

Attention Instruction: Amplifying Attention in the Middle via Prompting

Meiru Zhang, Zaiqiao Meng, Nigel Collier

TL;DR

This paper addresses position bias in retrieval-augmented generation, particularly the lost-in-the-middle effect where middle-context content is under-attended. It tests two prompting strategies—relative position words and absolute document indexes—to steer LLM attention in multi-document QA across five open-source models, finding no robust relative position awareness but clear benefits from absolute attention instructions using IDs or position words. Absolute instructions produce diagonal performance gains in attention-guided regions, with Llama-3 often strongest and results varying by document count (3 vs 9) and model. The study offers a practical, non-finetuning pathway to mitigate middle-content neglect in RAG, informing future work on larger contexts, diverse models, and more realistic document sets.

Abstract

The context window of large language models has been extended to 128k tokens or more. However, language models still suffer from position bias and have difficulty in accessing and using the middle part of the context due to the lack of attention. We examine the relative position awareness of LLMs and the feasibility of mitigating disproportional attention through prompting. We augment the original task instruction with $\texttt{attention instructions}$ that direct language models to allocate more attention towards a selected segment of the context. We conduct a comprehensive investigation on multi-document question answering task with both position-based and index-based instructions. We find that language models do not have relative position awareness of the context. Nevertheless, they demonstrate the capacity to adapt attention to a specific segment using matching indexes. Our analysis contributes to a deeper understanding of position bias in LLMs and provides a pathway to mitigate this bias by instruction, thus benefiting LLMs in locating and utilizing relevant information from retrieved documents in RAG applications.

Attention Instruction: Amplifying Attention in the Middle via Prompting

TL;DR

Abstract

that direct language models to allocate more attention towards a selected segment of the context. We conduct a comprehensive investigation on multi-document question answering task with both position-based and index-based instructions. We find that language models do not have relative position awareness of the context. Nevertheless, they demonstrate the capacity to adapt attention to a specific segment using matching indexes. Our analysis contributes to a deeper understanding of position bias in LLMs and provides a pathway to mitigate this bias by instruction, thus benefiting LLMs in locating and utilizing relevant information from retrieved documents in RAG applications.

Paper Structure (37 sections, 18 figures, 2 tables, 1 algorithm)

This paper contains 37 sections, 18 figures, 2 tables, 1 algorithm.

Introduction
Experimental Setup
Overview
Dataset
Models
Attention Instruction
Indexing Documents in Search Results
Can LLMs follow relative attention instructions?
Setting
Results
Discussion
Can we instruct LLMs to attend to a document using absolute attention instruction?
Setting
Results
Discussion
...and 22 more sections

Figures (18)

Figure 1: Top: An example of RAG for open question answering, where the prompt contains the sorted documents. Middle: The position bias (i.e. lost in the middle) can be visualized by attention score, which shows a significant drop in the middle wherever the gold answer is placed. Bottom: We solve this by augmenting the prompt with an attention instruction.
Figure 2: Prompt structure. The prompt is structured in four parts: the MDQA task instruction, the attention instruction, the search results containing the provided documents, and the question to answer. The top two $\Box$ boxes show the two types of attention instructions, where the attention segment phrase is marked in bold. Three index types for documents (highlighted in for ID-index and for position-index) are shown in the left $\Box$ boxes, with the gold document shown in different positions.
Figure 3: Accuracy heatmaps of Llama-2-chat, Llama-3 and Mistral-Instruct-v0.2 when using relative attention instruction. Top Row: results in no-index setting. Bottom Row: results when using ascending ids as document index. In each cell of the heatmaps, the accuracy value is shown in % and the $+-$ indicates the performance difference compared to without using attention instruction. The darker the color of the cell, the higher the accuracy.
Figure 4: Results of Llama-2-chat, Mistral-Instruct-v0.2, Tulu-2, Llama-3 using absolute attention instruction with relative numerical IDs as document indexes.
Figure 5: The attention score heatmaps of Mistral-Instruct-v0.2 using absolute attention instruction with document id index. Each subplot is a pair of gold document positions and attention segments in the same arrangement as the accuracy heatmap. The color bar starts with 0, those white areas may have reduced or unchanged attention scores.
...and 13 more figures

Attention Instruction: Amplifying Attention in the Middle via Prompting

TL;DR

Abstract

Attention Instruction: Amplifying Attention in the Middle via Prompting

Authors

TL;DR

Abstract

Table of Contents

Figures (18)