Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization
Cheng-Yu Hsieh, Yung-Sung Chuang, Chun-Liang Li, Zifeng Wang, Long T. Le, Abhishek Kumar, James Glass, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, Tomas Pfister
TL;DR
This work investigates why large language models struggle to retrieve information located in the middle of long inputs. The authors link the phenomenon to a robust U-shaped positional attention bias, showing that early and late input regions attract more attention regardless of content. They propose a calibration method, found-in-the-middle, to disentangle bias from true relevance and demonstrate that calibrated attention improves the model’s ability to locate middle-context information and enhances RAG performance by up to about 15 percentage points across tasks and models. The method is inference-time and can complement reordering-based pipelines, offering a principled way to improve long-context utilization in practical deployments. These findings provide a deeper understanding of attention biases in LLMs and lay groundwork for more reliable long-context reasoning.
Abstract
Large language models (LLMs), even when specifically trained to process long input contexts, struggle to capture relevant information located in the middle of their input. This phenomenon has been known as the lost-in-the-middle problem. In this work, we make three contributions. First, we set out to understand the factors that cause this phenomenon. In doing so, we establish a connection between lost-in-the-middle to LLMs' intrinsic attention bias: LLMs exhibit a U-shaped attention bias where the tokens at the beginning and at the end of its input receive higher attention, regardless of their relevance. Second, we mitigate this positional bias through a calibration mechanism, found-in-the-middle, that allows the model to attend to contexts faithfully according to their relevance, even though when they are in the middle. Third, we show found-in-the-middle not only achieves better performance in locating relevant information within a long context, but also eventually leads to improved retrieval-augmented generation (RAG) performance across various tasks, outperforming existing methods by up to 15 percentage points. These findings open up future directions in understanding LLM attention bias and its potential consequences.
