Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs
Runchu Tian, Yanghao Li, Yuepeng Fu, Siyang Deng, Qinyu Luo, Cheng Qian, Shuo Wang, Xin Cong, Zhong Zhang, Yesai Wu, Yankai Lin, Huadong Wang, Xiaojiang Liu
TL;DR
This work investigates how the spacing and distribution of multiple relevant information pieces in long-context inputs bias LLMs. It introduces LongPiBench, a benchmark that isolates absolute and relative positional effects across four input lengths up to 256K tokens and evaluates nine models. The findings show reduced susceptibility to the traditional 'lost in the middle' effect for many models, but persistent biases related to relative positioning—especially in retrieval tasks—and reveal that model size alone does not fix these biases, highlighting the need for targeted mitigation and contextualization strategies. Overall, LongPiBench provides a rigorous framework for diagnosing and addressing positional biases in long-context LLMs, with implications for improving robustness in real-world long-text applications.
Abstract
Positional bias in large language models (LLMs) hinders their ability to effectively process long inputs. A prominent example is the "lost in the middle" phenomenon, where LLMs struggle to utilize relevant information situated in the middle of the input. While prior research primarily focuses on single pieces of relevant information, real-world applications often involve multiple relevant information pieces. To bridge this gap, we present LongPiBench, a benchmark designed to assess positional bias involving multiple pieces of relevant information. Thorough experiments are conducted with five commercial and six open-source models. These experiments reveal that while most current models are robust against the "lost in the middle" issue, there exist significant biases related to the spacing of relevant information pieces. These findings highlight the importance of evaluating and reducing positional biases to advance LLM's capabilities.
