Table of Contents
Fetching ...

Vector Arithmetic in Concept and Token Subspaces

Sheridan Feucht, Byron Wallace, David Bau

TL;DR

The paper addresses the challenge of performing word2vec-like vector arithmetic in LLM hidden states by showing that semantic and surface information reside in distinct subspaces discoverable via specialized attention heads. It introduces concept and token lenses, $L_{C_k}$ and $L_{T_k}$, built from top-k induction heads to extract these subspaces and tests parallelogram-style relations using transformed hidden states, e.g., $L a_ obreak_ obreak_ _ _ _ _ _ - L b_ obreak_ _ _ _ _ = L a'_ obreak_ _ _ _ _ - L b'_ _ _ _ _ $ with $L a'_ _ _ _ _ $ as the nearest neighbor. Empirical results on Llama-2-7b demonstrate substantially higher nearest-neighbor accuracy (up to ~80%) in semantic tasks compared to raw hidden states (~47%), and better performance on surface-level transformations with token lenses. The work also shows that the effective subspaces are low-dimensional and that high performance is retained under strong rank reduction (down to $r=256$). Overall, the paper reveals a geometrical, interpretable structure in activation spaces and provides a practical method to harness it for semantic and surface-level word manipulations.

Abstract

In order to predict the next token, LLMs must represent semantic and surface-level information about the current word. Previous work identified two types of attention heads that disentangle this information: (i) Concept induction heads, which copy word meanings, and (ii) Token induction heads, which copy literal token representations (Feucht et al., 2025). We show that these heads can be used to identify subspaces of model activations that exhibit coherent semantic structure in Llama-2-7b. Specifically, when we transform hidden states using the attention weights of concept heads, we are able to more accurately perform parallelogram arithmetic (Mikolov et al., 2013) on the resulting hidden states, e.g., showing that "Athens" - "Greece" + "China" = "Beijing". This transformation allows for much higher nearest-neighbor accuracy (80%) than direct use of raw hidden states (47%). Analogously, we show that token heads allow for transformations that reveal surface-level word information in hidden states, allowing for operations like "coding" - "code" + "dance" = "dancing".

Vector Arithmetic in Concept and Token Subspaces

TL;DR

The paper addresses the challenge of performing word2vec-like vector arithmetic in LLM hidden states by showing that semantic and surface information reside in distinct subspaces discoverable via specialized attention heads. It introduces concept and token lenses, and , built from top-k induction heads to extract these subspaces and tests parallelogram-style relations using transformed hidden states, e.g., with as the nearest neighbor. Empirical results on Llama-2-7b demonstrate substantially higher nearest-neighbor accuracy (up to ~80%) in semantic tasks compared to raw hidden states (~47%), and better performance on surface-level transformations with token lenses. The work also shows that the effective subspaces are low-dimensional and that high performance is retained under strong rank reduction (down to ). Overall, the paper reveals a geometrical, interpretable structure in activation spaces and provides a practical method to harness it for semantic and surface-level word manipulations.

Abstract

In order to predict the next token, LLMs must represent semantic and surface-level information about the current word. Previous work identified two types of attention heads that disentangle this information: (i) Concept induction heads, which copy word meanings, and (ii) Token induction heads, which copy literal token representations (Feucht et al., 2025). We show that these heads can be used to identify subspaces of model activations that exhibit coherent semantic structure in Llama-2-7b. Specifically, when we transform hidden states using the attention weights of concept heads, we are able to more accurately perform parallelogram arithmetic (Mikolov et al., 2013) on the resulting hidden states, e.g., showing that "Athens" - "Greece" + "China" = "Beijing". This transformation allows for much higher nearest-neighbor accuracy (80%) than direct use of raw hidden states (47%). Analogously, we show that token heads allow for transformations that reveal surface-level word information in hidden states, allowing for operations like "coding" - "code" + "dance" = "dancing".

Paper Structure

This paper contains 8 sections, 1 equation, 6 figures, 1 table.

Figures (6)

  • Figure 1: word2vec-style vector arithmetic is more accurate when working in subspaces from feucht2025dualroute instead of using raw hidden states. (a) To extract embeddings for a word, we prefix with a constant phrase (e.g. "She travelled to") and save the last token representation of the word at a chosen layer $\ell$. To extract conceptual or token information from this vector, we multiply by concept and token lenses $L_{C_k}$ and $L_{T_k}$ respectively (Section \ref{['sec:lenses']}). (b) Using a vector from a separate context to represent each word, we measure whether Athens -- Greece + China has Beijing as its top nearest neighbor. (c) For semantic tasks like capital cities and gender-based family words, doing vector arithmetic in the subspace of the top-$k$ concept heads (red) is more effective than using raw hidden states (orange), the top-$k$ token heads (blue), or the sum of all attention head OV matrices (green). On the other hand, the subspace read by the top-$k$ token heads is most effective for grammatical tasks that involve changing the spelling of a word (e.g., code$\rightarrow$coding). For comparison, dotted gray lines represent random chance, whereas dotted light blue represents Llama-2-7b's 5-shot ICL accuracy for this task. We use $k=80$, as found in feucht2025dualroute.
  • Figure 2: Nearest-neighbor accuracy for all word2vec tasks mikolov with prefixes for each task in Table \ref{['tab:prefixes']} (Llama-2-7b). Dotted gray lines indicate guessing accuracy (out of all possible neighbors/words in the dataset). Dotted light blue lines indicate 5-shot ICL accuracy for this task, i.e., the best possible performance this model can have for this task. We do not expect high performance for the "opposite" task due to its cyclic nature: to represent the concept of "opposite," we need possible -- impossible = impossible -- possible, which is incompatible with parallelogram arithmetic. Targeted subspaces are more effective than using all attention heads for most tasks, except for gram1, gram3, and gram4.
  • Figure 3: Reducing the rank of $L$ by taking the top-$r$ singular components does not damage nearest-neighbor accuracy. (a) Inspecting the singular values of our concept lens, $L_{C_k}$, and token lens, $L_{T_k}$, these transformations appear to be full-rank. (b) Regardless, we take $r$-rank approximations of these transformations by setting all singular values after the top-$r$ values to zero. (c) We choose the best layer for each task from Figure \ref{['fig:figure1']} and reduce the rank of every $L$ in this way. Performance is maintained for ranks as low as $r=256$. Note that values for $r=4096$ are the same as results from Figure \ref{['fig:figure1']}.
  • Figure 4: Nearest-neighbor accuracy for all word2vec tasks mikolov without any prefixes (i.e., feeding each word to the model by itself with no context). Comparing with Figure \ref{['fig:word2vec-withprefix']}, certain tasks like "currency" are much less accurate; this may be because currencies like "real" are not immediately recognizable out of context. However, accuracy is slightly better for "capital-common-countries" and "gram6-nationality-adjective" without any prefixes.
  • Figure 5: Nearest-neighbor accuracy for all function vector tasks todd2024function with prefixes for each task listed in Table \ref{['tab:prefixes']}. Dotted gray lines indicate guessing accuracy (out of all possible neighbors/words in the dataset). Dotted light blue lines indicate 5-shot ICL accuracy, i.e., the best possible performance this model can have for this task. The failure of many of these tasks is unsurprising: some tasks are many-to-one relations that may not be represented as parallelograms ("capitalize-first-letter"), whereas others may be too complex to be directly encoded in the model's embedding space ("national-parks"). Note: "country-currency" includes more countries (197) than the word2vec "currency" task (30).
  • ...and 1 more figures