Retrieval Heads are Dynamic
Yuping Lin, Zitao Li, Yue Xing, Pengfei He, Yingqian Cui, Yaliang Li, Bolin Ding, Jingren Zhou, Jiliang Tang
TL;DR
This work reframes retrieval heads in autoregressive LLMs as a dynamic, context-dependent component rather than a fixed subset. It reveals three core findings: retrieval heads vary across generation steps (dynamism), the specific dynamic heads at a given step are irreplaceable by static ones, and the model's final hidden state strongly encodes future retrieval patterns (correlation). The authors validate these claims on Needle-in-a-Haystack and HotpotQA and demonstrate practical benefits by integrating dynamic retrieval heads into a Dynamic RAG framework, achieving improved retrieval and reasoning performance. These results illuminate a planning-like mechanism in LLMs and suggest state-aware intervention strategies for more precise, context-sensitive information retrieval.
Abstract
Recent studies have identified "retrieval heads" in Large Language Models (LLMs) responsible for extracting information from input contexts. However, prior works largely rely on static statistics aggregated across datasets, identifying heads that perform retrieval on average. This perspective overlooks the fine-grained temporal dynamics of autoregressive generation. In this paper, we investigate retrieval heads from a dynamic perspective. Through extensive analysis, we establish three core claims: (1) Dynamism: Retrieval heads vary dynamically across timesteps; (2) Irreplaceability: Dynamic retrieval heads are specific at each timestep and cannot be effectively replaced by static retrieval heads; and (3) Correlation: The model's hidden state encodes a predictive signal for future retrieval head patterns, indicating an internal planning mechanism. We validate these findings on the Needle-in-a-Haystack task and a multi-hop QA task, and quantify the differences on the utility of dynamic and static retrieval heads in a Dynamic Retrieval-Augmented Generation framework. Our study provides new insights into the internal mechanisms of LLMs.
