Vector Arithmetic in Concept and Token Subspaces
Sheridan Feucht, Byron Wallace, David Bau
TL;DR
The paper addresses the challenge of performing word2vec-like vector arithmetic in LLM hidden states by showing that semantic and surface information reside in distinct subspaces discoverable via specialized attention heads. It introduces concept and token lenses, $L_{C_k}$ and $L_{T_k}$, built from top-k induction heads to extract these subspaces and tests parallelogram-style relations using transformed hidden states, e.g., $L a_ obreak_ obreak_ _ _ _ _ _ - L b_ obreak_ _ _ _ _ = L a'_ obreak_ _ _ _ _ - L b'_ _ _ _ _ $ with $L a'_ _ _ _ _ $ as the nearest neighbor. Empirical results on Llama-2-7b demonstrate substantially higher nearest-neighbor accuracy (up to ~80%) in semantic tasks compared to raw hidden states (~47%), and better performance on surface-level transformations with token lenses. The work also shows that the effective subspaces are low-dimensional and that high performance is retained under strong rank reduction (down to $r=256$). Overall, the paper reveals a geometrical, interpretable structure in activation spaces and provides a practical method to harness it for semantic and surface-level word manipulations.
Abstract
In order to predict the next token, LLMs must represent semantic and surface-level information about the current word. Previous work identified two types of attention heads that disentangle this information: (i) Concept induction heads, which copy word meanings, and (ii) Token induction heads, which copy literal token representations (Feucht et al., 2025). We show that these heads can be used to identify subspaces of model activations that exhibit coherent semantic structure in Llama-2-7b. Specifically, when we transform hidden states using the attention weights of concept heads, we are able to more accurately perform parallelogram arithmetic (Mikolov et al., 2013) on the resulting hidden states, e.g., showing that "Athens" - "Greece" + "China" = "Beijing". This transformation allows for much higher nearest-neighbor accuracy (80%) than direct use of raw hidden states (47%). Analogously, we show that token heads allow for transformations that reveal surface-level word information in hidden states, allowing for operations like "coding" - "code" + "dance" = "dancing".
