Table of Contents
Fetching ...

Locating and Editing Factual Associations in Mamba

Arnab Sen Sharma, David Atkinson, David Bau

TL;DR

This study investigates whether factual recall mechanisms found in transformer LMs generalize to Mamba, a state-space recurrent architecture. By repurposing activation patching, causal tracing, rank-one edits (ROME), linearity probes (LRE), and attention-knockout approaches, the authors map recall localization, editability, and relational representations in Mamba-2.8b and compare to a similarly sized Pythia-2.8b transformer. They find that recall localizes in mid-layers and late residual projections, that W_o edits offer robust and generalizable factual edits, and that LRE captures partial, relation-dependent linear structure; attention-knockout insights transfer only partially due to architectural differences. Overall, the work suggests autoregressive prompting imposes a locality pattern across architectures, while highlighting where interpretability tools must be adapted for state-space models and pointing to practical implications for reliable factual editing and transparent reasoning in diverse LMs.

Abstract

We investigate the mechanisms of factual recall in the Mamba state space model. Our work is inspired by previous findings in autoregressive transformer language models suggesting that their knowledge recall is localized to particular modules at specific token locations; we therefore ask whether factual recall in Mamba can be similarly localized. To investigate this, we conduct four lines of experiments on Mamba. First, we apply causal tracing or interchange interventions to localize key components inside Mamba that are responsible for recalling facts, revealing that specific components within middle layers show strong causal effects at the last token of the subject, while the causal effect of intervening on later layers is most pronounced at the last token of the prompt, matching previous findings on autoregressive transformers. Second, we show that rank-one model editing methods can successfully insert facts at specific locations, again resembling findings on transformer LMs. Third, we examine the linearity of Mamba's representations of factual relations. Finally we adapt attention-knockout techniques to Mamba in order to dissect information flow during factual recall. We compare Mamba directly to a similar-sized autoregressive transformer LM and conclude that despite significant differences in architectural approach, when it comes to factual recall, the two architectures share many similarities.

Locating and Editing Factual Associations in Mamba

TL;DR

This study investigates whether factual recall mechanisms found in transformer LMs generalize to Mamba, a state-space recurrent architecture. By repurposing activation patching, causal tracing, rank-one edits (ROME), linearity probes (LRE), and attention-knockout approaches, the authors map recall localization, editability, and relational representations in Mamba-2.8b and compare to a similarly sized Pythia-2.8b transformer. They find that recall localizes in mid-layers and late residual projections, that W_o edits offer robust and generalizable factual edits, and that LRE captures partial, relation-dependent linear structure; attention-knockout insights transfer only partially due to architectural differences. Overall, the work suggests autoregressive prompting imposes a locality pattern across architectures, while highlighting where interpretability tools must be adapted for state-space models and pointing to practical implications for reliable factual editing and transparent reasoning in diverse LMs.

Abstract

We investigate the mechanisms of factual recall in the Mamba state space model. Our work is inspired by previous findings in autoregressive transformer language models suggesting that their knowledge recall is localized to particular modules at specific token locations; we therefore ask whether factual recall in Mamba can be similarly localized. To investigate this, we conduct four lines of experiments on Mamba. First, we apply causal tracing or interchange interventions to localize key components inside Mamba that are responsible for recalling facts, revealing that specific components within middle layers show strong causal effects at the last token of the subject, while the causal effect of intervening on later layers is most pronounced at the last token of the prompt, matching previous findings on autoregressive transformers. Second, we show that rank-one model editing methods can successfully insert facts at specific locations, again resembling findings on transformer LMs. Third, we examine the linearity of Mamba's representations of factual relations. Finally we adapt attention-knockout techniques to Mamba in order to dissect information flow during factual recall. We compare Mamba directly to a similar-sized autoregressive transformer LM and conclude that despite significant differences in architectural approach, when it comes to factual recall, the two architectures share many similarities.
Paper Structure (22 sections, 6 equations, 14 figures)

This paper contains 22 sections, 6 equations, 14 figures.

Figures (14)

  • Figure 1: Architecture of a MambaBlock. Projection matrices $\text{W}_a^{\ell}$ and $\text{W}_g^{\ell}$ have the shape $2d \times d$, while $\text{W}_o^{\ell}$ has the shape $d \times 2d$. $h, a, g, s, \;\text{and}\; o$ are intermediate states of a token representation. $\sigma$ is SiLU activation and $\otimes$ is elementwise multiplication. Conv + SSM operation abstracts the Conv1D and selective-SSM operations.
  • Figure 2: (a) Activation patching. A state from the clean run $G$ is patched into its corresponding position in the corrupted run $G^*$. This has a downstream effect of potentially changing all the states that depend on the patched state in $G^*[\leftarrow h_i^{(\ell)}]$. (b) Average indirect effect of applying causal tracing on residual stream states ($h_i^{(\ell)}$ in \ref{['fig:mambablock']}) across 400 different facts from the Relations dataset (see \ref{['app:relations']}).
  • Figure 3: Average indirect effect of different states $o_i^{(\ell)}$, $g_i^{(\ell)}$, and $s_i^{(\ell)}$ over 400 facts from the Relations dataset (see \ref{['app:relations']}). For each layer $\ell$, states for a window of 10 layers around $\ell$ are restored from the clean run $G$.
  • Figure 4: To probe for path-specific effects, (a)$h_i^{(\ell)}$ is restored from the clean run $G$ as in \ref{['fig:activation_patching']}. (b) Then, to reveal the role of the Conv + SSM contributions, $s_i$ states from the corrupted run $G^*$ are also patched to block the contributions from those paths.
  • Figure 5: Impact of ablating $s_i$, $g_i$, and $o_i$ on $\text{IE}_{h_i^{(\ell)}}$ for (a)subject last and (b)prompt last token positions. Taken together (a) and (b) show a clear separation roles between early-mid and later layers in Mamba-2.8b. $h_i^{(\ell)}$ up to layer $46$ only show strong IE at the subject last token position and have negligible impact after that. Whereas IE of $h_i^{(\ell)}$ jumps to $1.0$ after layer $46$. (a) also shows that, at the subject last token, before layer $27-28$, $\text{IE}_{h_i^{(\ell)}}$ is significantly reduced by blocking either $o_i$, $g_i$, or $s_i$ paths (sorted in descending order of damaging $\text{IE}_{h_i^{(\ell)}}$). (b) At the prompt last token, ablating $o_i$ or $s_i$ paths can significantly reduce $\text{IE}_{h_i^{(\ell)}}$ in layers $47-50$.
  • ...and 9 more figures