Table of Contents
Fetching ...

Multiple Streams of Knowledge Retrieval: Enriching and Recalling in Transformers

Todd Nief, David Reber, Sean Richardson, Ari Holtzman

TL;DR

This work introduces dynamic weight grafting to causally localize how finetuned knowledge is retrieved in transformer-based LLMs, addressing limitations of activation patching. The authors reveal two retrieval pathways—an enrichment pathway at entity positions and a recall pathway near the final token—demonstrating that either pathway can suffice for relation completion under certain prompts. By performing token-wise, component-wise grafting, they localize recall to the final-token output projection and FFN, and show task-specific attention at the first entity is also crucial, enabling a fine-grained mechanistic understanding of knowledge recall. The study spans multiple models and synthetic datasets, highlighting how finetuned information is retrieved through distinct, interpretable components, with implications for interpretability, targeted editing, and safer knowledge updates in LLMs.

Abstract

When an LLM learns a new fact during finetuning (e.g., new movie releases, newly elected pope, etc.), where does this information go? Are entities enriched with relation information, or do models recall information just-in-time before a prediction? Or, are ``all of the above'' true with LLMs implementing multiple redundant heuristics? Existing localization approaches (e.g., activation patching) are ill-suited for this analysis because they usually \textit{replace} parts of the residual stream, thus overriding previous information. To fill this gap, we propose \emph{dynamic weight grafting}, a technique that selectively grafts weights from a finetuned model onto a pretrained model. Using this technique, we show two separate pathways for retrieving finetuned relation information: 1) ``enriching" the residual stream with relation information while processing the tokens that correspond to an entity (e.g., ``Zendaya'' in ``Zendaya co-starred with John David Washington'') and 2) ``recalling" this information at the final token position before generating a target fact. In some cases, models need information from both of these pathways to correctly generate finetuned facts while, in other cases, either the ``enrichment" or ``recall" pathway alone is sufficient. We localize the ``recall'' pathway to model components -- finding that ``recall" occurs via both task-specific attention mechanisms and an entity-specific extraction step in the feedforward networks of the final layers before the target prediction. By targeting model components and parameters, as opposed to just activations, we are able to understand the \textit{mechanisms} by which finetuned knowledge is retrieved during generation.

Multiple Streams of Knowledge Retrieval: Enriching and Recalling in Transformers

TL;DR

This work introduces dynamic weight grafting to causally localize how finetuned knowledge is retrieved in transformer-based LLMs, addressing limitations of activation patching. The authors reveal two retrieval pathways—an enrichment pathway at entity positions and a recall pathway near the final token—demonstrating that either pathway can suffice for relation completion under certain prompts. By performing token-wise, component-wise grafting, they localize recall to the final-token output projection and FFN, and show task-specific attention at the first entity is also crucial, enabling a fine-grained mechanistic understanding of knowledge recall. The study spans multiple models and synthetic datasets, highlighting how finetuned information is retrieved through distinct, interpretable components, with implications for interpretability, targeted editing, and safer knowledge updates in LLMs.

Abstract

When an LLM learns a new fact during finetuning (e.g., new movie releases, newly elected pope, etc.), where does this information go? Are entities enriched with relation information, or do models recall information just-in-time before a prediction? Or, are ``all of the above'' true with LLMs implementing multiple redundant heuristics? Existing localization approaches (e.g., activation patching) are ill-suited for this analysis because they usually \textit{replace} parts of the residual stream, thus overriding previous information. To fill this gap, we propose \emph{dynamic weight grafting}, a technique that selectively grafts weights from a finetuned model onto a pretrained model. Using this technique, we show two separate pathways for retrieving finetuned relation information: 1) ``enriching" the residual stream with relation information while processing the tokens that correspond to an entity (e.g., ``Zendaya'' in ``Zendaya co-starred with John David Washington'') and 2) ``recalling" this information at the final token position before generating a target fact. In some cases, models need information from both of these pathways to correctly generate finetuned facts while, in other cases, either the ``enrichment" or ``recall" pathway alone is sufficient. We localize the ``recall'' pathway to model components -- finding that ``recall" occurs via both task-specific attention mechanisms and an entity-specific extraction step in the feedforward networks of the final layers before the target prediction. By targeting model components and parameters, as opposed to just activations, we are able to understand the \textit{mechanisms} by which finetuned knowledge is retrieved during generation.

Paper Structure

This paper contains 64 sections, 3 equations, 20 figures, 2 tables, 1 algorithm.

Figures (20)

  • Figure 1: We introduce dynamic weight grafting---swapping the weights of a pretrained model for the weights of a model that has undergone supervised finetuning (SFT). (a) We compare dynamic weight grafting to activation patching. In activation patching, we replace model activations at a specific point with activations from another run. In dynamic weight grafting, we replace mechanism by swapping in specific parameter matrices of a finetuned model into a pretrained model. (b) A schematic showing the different dynamic weight grafting schemes used in our experiments. In position grafting, we use either the entire pretrained or finetuned model at a given token position. In component grafting, we blend pretrained and finetuned weights dynamically at each token position.
  • Figure 2: We show top-5 accuracy for position grafting for the headline test sentence. Grafting configurations are PRE (pretrained baseline), SFT (supervised finetuning baseline), FE (grafting only the first entity), LT (grafting only the last token position), FE+LT, (FE+LT)$^\text{C}$ (grafting everything except the first entity and last token position), FE$^\text{C}$, and LT$^\text{C}$. All models show nearly full SFT performance by grafting only the FE and LT tokens and near pretrained performance when grafting everything except the FE and LT tokens. We present results for Gemma and GPT-2 XL. Results for Llama (similar to Gemma) and Pythia (similar to GPT-2 XL) are available in \ref{['app:additional_results']}.
  • Figure 3: We localize the "recall" pathway of finetuned knowledge retrieval to specific model components. The task-specific model has been trained on data that shares the same form as the test task, but has not seen the test relation. The relation-specific model has been trained on the test relations. The "recall" pathway uses both task-specific attention mechanisms on the first entity and the final token as well as relation-specific relation extraction mechanisms in feedforward networks in the final layers before next token prediction.
  • Figure 4: We graft weights from models finetuned on both directions of a symmetric relationship at the last token to identify which components drive relation completion. For Gemma and Llama, the output projection matrix and feedforward networks in the last quarter of the model recover most of the finetuned performance.
  • Figure 5: We graft from both a task model and a relation model onto a pretrained model, creating a hybrid model. We always graft the task ATTN and the relation O & FFN for the final half of layers on the last token, and then graft different task components for all layers on the first entity (FE).
  • ...and 15 more figures