Table of Contents
Fetching ...

Gold Panning: Turning Positional Bias into Signal for Multi-Document LLM Reasoning

Adam Byerly, Daniel Khashabi

TL;DR

This work addresses the problem of position bias in LLMs during multi-document reasoning. It introduces Gold Panning Bandits, a combinatorial bandit framework that treats document reorderings as informative actions, with detectors described by $\mathrm{TPR}_j$ and $\mathrm{FPR}_j$ and a belief state over item relevance. A greedy Gold Panning algorithm matches the most uncertain documents to the most diagnostic positions, achieving per-round complexity $O(N \log N)$ and offering theoretical guarantees (convergence, and myopic optimality under symmetric detectors). Empirically, Gold Panning reduces required LM queries by up to $65\%$ versus random shuffles and yields substantial accuracy gains on real-world tasks, including a $34\%$ accuracy increase at larger context sizes, demonstrating practical inference-time efficiency gains.

Abstract

Large language models exhibit a strong position bias in multi-document contexts, systematically prioritizing information based on location rather than relevance. While existing approaches treat this bias as noise to be mitigated, we introduce Gold Panning Bandits, a framework that leverages position bias as a diagnostic signal: by reordering documents and observing shifts in the model's responses, we can efficiently identify the most relevant content. We frame the problem of choosing reorderings as a bipartite matching problem. While an optimal assignment can be computed at each iteration with the Hungarian algorithm in $O(N^3)$ time, we propose a greedy $O(N \log N)$ strategy that achieves comparable performance by prioritizing the placement of the most uncertain documents in the most informative positions. Our approach identifies relevant documents using up to 65\% fewer language model queries than random permutation baselines on knowledge-intensive NLP tasks, substantially reducing computational cost without model retraining. This work demonstrates that inherent LLM biases can be transformed from liabilities into assets for efficient, inference-time optimization.

Gold Panning: Turning Positional Bias into Signal for Multi-Document LLM Reasoning

TL;DR

This work addresses the problem of position bias in LLMs during multi-document reasoning. It introduces Gold Panning Bandits, a combinatorial bandit framework that treats document reorderings as informative actions, with detectors described by and and a belief state over item relevance. A greedy Gold Panning algorithm matches the most uncertain documents to the most diagnostic positions, achieving per-round complexity and offering theoretical guarantees (convergence, and myopic optimality under symmetric detectors). Empirically, Gold Panning reduces required LM queries by up to versus random shuffles and yields substantial accuracy gains on real-world tasks, including a accuracy increase at larger context sizes, demonstrating practical inference-time efficiency gains.

Abstract

Large language models exhibit a strong position bias in multi-document contexts, systematically prioritizing information based on location rather than relevance. While existing approaches treat this bias as noise to be mitigated, we introduce Gold Panning Bandits, a framework that leverages position bias as a diagnostic signal: by reordering documents and observing shifts in the model's responses, we can efficiently identify the most relevant content. We frame the problem of choosing reorderings as a bipartite matching problem. While an optimal assignment can be computed at each iteration with the Hungarian algorithm in time, we propose a greedy strategy that achieves comparable performance by prioritizing the placement of the most uncertain documents in the most informative positions. Our approach identifies relevant documents using up to 65\% fewer language model queries than random permutation baselines on knowledge-intensive NLP tasks, substantially reducing computational cost without model retraining. This work demonstrates that inherent LLM biases can be transformed from liabilities into assets for efficient, inference-time optimization.

Paper Structure

This paper contains 34 sections, 7 theorems, 28 equations, 6 figures, 1 algorithm.

Key Result

Theorem 4.1

The Gold Panning strategy provides a greater or equal expected one-step reduction in total entropy than a random permutation strategy (e.g., PSC).

Figures (6)

  • Figure 1: Overview of our Gold Panning algorithm. We leverage an LLM's known positional bias (left) to solve a needle-in-haystack task (center). Our iterative method (right) involves querying the model, updating beliefs about document relevance, and strategically reordering documents for the next query. By placing uncertain documents in the most informative positions, we rapidly identify relevant content with fewer queries.
  • Figure 2: Performance of Gold Panning (GP), Hungarian Method, and Permutation Self-Consistency (PSC) baseline across $20$ queries for varying numbers of documents ($N = 10, 30, and 50$). Accuracy@k is averaged over $10,000$ Monte Carlo runs. The results show that GP's performance is nearly indistinguishable from the optimal Hungarian method and across all scales significantly outperforms the PSC baseline.
  • Figure 3: Performance comparison of Gold Panning versus baselines on GPT-4o-mini across two context sizes. The plots show answer accuracy over successive iterations. With 100 facts (left), both the PSC and TS methods largely fail to improve performance, while GP provides modest gains. With 400 facts (right), both PSC and TS continue to provide little improvement, while GP provides a roughly 34% increase (from 0.57 to 0.75), beating out both baselines. The single-shot average (TS) represents expected performance from a single query at a random position.
  • Figure 4: Performance Degradation of the Gold Panning Strategy under Noisy Parameter Estimates. The plot shows performance over 20 iterations, averaged across $10,000$ Monte Carlo runs, for four levels of noise in the agent's estimates of detector TPR and FPR. The inset provides a magnified view of the final iterations, highlighting the subtle divergence between the "Perfect" and "Low Noise" scenarios. The strategy shows strong resilience to minor estimation errors and degrades gracefully under more significant noise.
  • Figure 5: Performance of Gold Panning vs. Permutation Self-Consistency (PSC) as a function of environment homogeneity. The x-axis plots the Beta concentration parameter ($\alpha$) on a log scale, where higher values correspond to more homogeneous detectors. The y-axis shows the final ranking accuracy after $20$ iterations. The advantage of Gold Panning is largest in heterogeneous environments (low concentration) and vanishes as the environment becomes homogeneous.
  • ...and 1 more figures

Theorems & Definitions (12)

  • Theorem 4.1: Greater One-Step Entropy Reduction than Random Strategy
  • Theorem 4.2: Myopic Optimality for Symmetric Detectors
  • Proposition A.1
  • proof
  • Theorem E.1: Belief Entropy Converges for Any Strategy
  • proof
  • Theorem E.2: Posterior Consistency Under Minimal Informativity
  • proof
  • Theorem F.1: Greater One-Step Entropy Reduction than Random Strategy
  • proof
  • ...and 2 more