Gold Panning: Turning Positional Bias into Signal for Multi-Document LLM Reasoning
Adam Byerly, Daniel Khashabi
TL;DR
This work addresses the problem of position bias in LLMs during multi-document reasoning. It introduces Gold Panning Bandits, a combinatorial bandit framework that treats document reorderings as informative actions, with detectors described by $\mathrm{TPR}_j$ and $\mathrm{FPR}_j$ and a belief state over item relevance. A greedy Gold Panning algorithm matches the most uncertain documents to the most diagnostic positions, achieving per-round complexity $O(N \log N)$ and offering theoretical guarantees (convergence, and myopic optimality under symmetric detectors). Empirically, Gold Panning reduces required LM queries by up to $65\%$ versus random shuffles and yields substantial accuracy gains on real-world tasks, including a $34\%$ accuracy increase at larger context sizes, demonstrating practical inference-time efficiency gains.
Abstract
Large language models exhibit a strong position bias in multi-document contexts, systematically prioritizing information based on location rather than relevance. While existing approaches treat this bias as noise to be mitigated, we introduce Gold Panning Bandits, a framework that leverages position bias as a diagnostic signal: by reordering documents and observing shifts in the model's responses, we can efficiently identify the most relevant content. We frame the problem of choosing reorderings as a bipartite matching problem. While an optimal assignment can be computed at each iteration with the Hungarian algorithm in $O(N^3)$ time, we propose a greedy $O(N \log N)$ strategy that achieves comparable performance by prioritizing the placement of the most uncertain documents in the most informative positions. Our approach identifies relevant documents using up to 65\% fewer language model queries than random permutation baselines on knowledge-intensive NLP tasks, substantially reducing computational cost without model retraining. This work demonstrates that inherent LLM biases can be transformed from liabilities into assets for efficient, inference-time optimization.
