Table of Contents
Fetching ...

Do RAG Systems Really Suffer From Positional Bias?

Florin Cuconasu, Simone Filice, Guy Horowitz, Yoelle Maarek, Fabrizio Silvestri

TL;DR

This paper investigates whether positional bias in LLMs truly degrades performance in Retrieval Augmented Generation (RAG) systems. Through controlled and real-world experiments across multiple benchmarks and retrieval pipelines, it shows that top-ranked passages often include both relevant and highly distracting content, and that stronger retrievers can exacerbate distraction. In controlled settings, positional bias appears pronounced, but in realistic scenarios the effect is marginal because relevant and distracting passages compete for the same top slots, diminishing any single-position advantage. Consequently, sophisticated passage-reordering strategies offer little improvement over random shuffling, shifting the focus toward improving retrieval quality and model robustness to distraction. The findings challenge prior emphasis on positional bias as a primary bottleneck and highlight the central role of distractor management in RAG systems.

Abstract

Retrieval Augmented Generation enhances LLM accuracy by adding passages retrieved from an external corpus to the LLM prompt. This paper investigates how positional bias - the tendency of LLMs to weight information differently based on its position in the prompt - affects not only the LLM's capability to capitalize on relevant passages, but also its susceptibility to distracting passages. Through extensive experiments on three benchmarks, we show how state-of-the-art retrieval pipelines, while attempting to retrieve relevant passages, systematically bring highly distracting ones to the top ranks, with over 60% of queries containing at least one highly distracting passage among the top-10 retrieved passages. As a result, the impact of the LLM positional bias, which in controlled settings is often reported as very prominent by related works, is actually marginal in real scenarios since both relevant and distracting passages are, in turn, penalized. Indeed, our findings reveal that sophisticated strategies that attempt to rearrange the passages based on LLM positional preferences do not perform better than random shuffling.

Do RAG Systems Really Suffer From Positional Bias?

TL;DR

This paper investigates whether positional bias in LLMs truly degrades performance in Retrieval Augmented Generation (RAG) systems. Through controlled and real-world experiments across multiple benchmarks and retrieval pipelines, it shows that top-ranked passages often include both relevant and highly distracting content, and that stronger retrievers can exacerbate distraction. In controlled settings, positional bias appears pronounced, but in realistic scenarios the effect is marginal because relevant and distracting passages compete for the same top slots, diminishing any single-position advantage. Consequently, sophisticated passage-reordering strategies offer little improvement over random shuffling, shifting the focus toward improving retrieval quality and model robustness to distraction. The findings challenge prior emphasis on positional bias as a primary bottleneck and highlight the central role of distractor management in RAG systems.

Abstract

Retrieval Augmented Generation enhances LLM accuracy by adding passages retrieved from an external corpus to the LLM prompt. This paper investigates how positional bias - the tendency of LLMs to weight information differently based on its position in the prompt - affects not only the LLM's capability to capitalize on relevant passages, but also its susceptibility to distracting passages. Through extensive experiments on three benchmarks, we show how state-of-the-art retrieval pipelines, while attempting to retrieve relevant passages, systematically bring highly distracting ones to the top ranks, with over 60% of queries containing at least one highly distracting passage among the top-10 retrieved passages. As a result, the impact of the LLM positional bias, which in controlled settings is often reported as very prominent by related works, is actually marginal in real scenarios since both relevant and distracting passages are, in turn, penalized. Indeed, our findings reveal that sophisticated strategies that attempt to rearrange the passages based on LLM positional preferences do not perform better than random shuffling.

Paper Structure

This paper contains 19 sections, 1 equation, 13 figures, 7 tables.

Figures (13)

  • Figure 1: Results of different retrieval pipelines when varying the number $k$ of retrieved passages. We compute the distracting effect on Qwen 2.5 7B.
  • Figure 2: Controlled experiments results for Qwen 2.5 7B. (a) Average accuracy when rotating a single relevant passage among weak distractors. (b) Average distracting effect when rotating a hard distractor among weak distractors. Both exhibit the characteristic U-shaped positional bias pattern.
  • Figure 3: Example showing how the position of the hard distractor affects Qwen 2.5 7B's response when a relevant passage is fixed in position 2. When a hard distractor (Document 5, DE=0.98) is placed in position 5 (highest distracting effect according to Fig. \ref{['fig:relevant_and_hard_in_weak']}b), the model provides an incorrect answer based on the hard distractor. However, simply moving the hard distractor to position 3 (lowest distracting effect), while maintaining the relevant passage in position 2, results in the model correctly answering "2002".
  • Figure 4: Results on PopQA of different retrieval pipelines when varying the number $k$ of retrieved passages. We compute the distracting effect on Qwen 2.5 7B.
  • Figure 5: Results on NQ of different retrieval pipelines when varying the number $k$ of retrieved passages. We compute the distracting effect on Qwen 2.5 7B.
  • ...and 8 more figures