Multiple Streams of Knowledge Retrieval: Enriching and Recalling in Transformers
Todd Nief, David Reber, Sean Richardson, Ari Holtzman
TL;DR
This work introduces dynamic weight grafting to causally localize how finetuned knowledge is retrieved in transformer-based LLMs, addressing limitations of activation patching. The authors reveal two retrieval pathways—an enrichment pathway at entity positions and a recall pathway near the final token—demonstrating that either pathway can suffice for relation completion under certain prompts. By performing token-wise, component-wise grafting, they localize recall to the final-token output projection and FFN, and show task-specific attention at the first entity is also crucial, enabling a fine-grained mechanistic understanding of knowledge recall. The study spans multiple models and synthetic datasets, highlighting how finetuned information is retrieved through distinct, interpretable components, with implications for interpretability, targeted editing, and safer knowledge updates in LLMs.
Abstract
When an LLM learns a new fact during finetuning (e.g., new movie releases, newly elected pope, etc.), where does this information go? Are entities enriched with relation information, or do models recall information just-in-time before a prediction? Or, are ``all of the above'' true with LLMs implementing multiple redundant heuristics? Existing localization approaches (e.g., activation patching) are ill-suited for this analysis because they usually \textit{replace} parts of the residual stream, thus overriding previous information. To fill this gap, we propose \emph{dynamic weight grafting}, a technique that selectively grafts weights from a finetuned model onto a pretrained model. Using this technique, we show two separate pathways for retrieving finetuned relation information: 1) ``enriching" the residual stream with relation information while processing the tokens that correspond to an entity (e.g., ``Zendaya'' in ``Zendaya co-starred with John David Washington'') and 2) ``recalling" this information at the final token position before generating a target fact. In some cases, models need information from both of these pathways to correctly generate finetuned facts while, in other cases, either the ``enrichment" or ``recall" pathway alone is sufficient. We localize the ``recall'' pathway to model components -- finding that ``recall" occurs via both task-specific attention mechanisms and an entity-specific extraction step in the feedforward networks of the final layers before the target prediction. By targeting model components and parameters, as opposed to just activations, we are able to understand the \textit{mechanisms} by which finetuned knowledge is retrieved during generation.
