Table of Contents
Fetching ...

Causality $\neq$ Invariance: Function and Concept Vectors in LLMs

Gustaw Opiełka, Hannes Rosenbusch, Claire E. Stevenson

TL;DR

The results show that LLMs do contain abstract concept representations, but these differ from those that drive ICL performance, and are identified as Concept Vectors (CVs), which carry more stable concept representations.

Abstract

Do large language models (LLMs) represent concepts abstractly, i.e., independent of input format? We revisit Function Vectors (FVs), compact representations of in-context learning (ICL) tasks that causally drive task performance. Across multiple LLMs, we show that FVs are not fully invariant: FVs are nearly orthogonal when extracted from different input formats (e.g., open-ended vs. multiple-choice), even if both target the same concept. We identify Concept Vectors (CVs), which carry more stable concept representations. Like FVs, CVs are composed of attention head outputs; however, unlike FVs, the constituent heads are selected using Representational Similarity Analysis (RSA) based on whether they encode concepts consistently across input formats. While these heads emerge in similar layers to FV-related heads, the two sets are largely distinct, suggesting different underlying mechanisms. Steering experiments reveal that FVs excel in-distribution, when extraction and application formats match (e.g., both open-ended in English), while CVs generalize better out-of-distribution across both question types (open-ended vs. multiple-choice) and languages. Our results show that LLMs do contain abstract concept representations, but these differ from those that drive ICL performance.

Causality $\neq$ Invariance: Function and Concept Vectors in LLMs

TL;DR

The results show that LLMs do contain abstract concept representations, but these differ from those that drive ICL performance, and are identified as Concept Vectors (CVs), which carry more stable concept representations.

Abstract

Do large language models (LLMs) represent concepts abstractly, i.e., independent of input format? We revisit Function Vectors (FVs), compact representations of in-context learning (ICL) tasks that causally drive task performance. Across multiple LLMs, we show that FVs are not fully invariant: FVs are nearly orthogonal when extracted from different input formats (e.g., open-ended vs. multiple-choice), even if both target the same concept. We identify Concept Vectors (CVs), which carry more stable concept representations. Like FVs, CVs are composed of attention head outputs; however, unlike FVs, the constituent heads are selected using Representational Similarity Analysis (RSA) based on whether they encode concepts consistently across input formats. While these heads emerge in similar layers to FV-related heads, the two sets are largely distinct, suggesting different underlying mechanisms. Steering experiments reveal that FVs excel in-distribution, when extraction and application formats match (e.g., both open-ended in English), while CVs generalize better out-of-distribution across both question types (open-ended vs. multiple-choice) and languages. Our results show that LLMs do contain abstract concept representations, but these differ from those that drive ICL performance.
Paper Structure (37 sections, 9 equations, 24 figures, 4 tables)

This paper contains 37 sections, 9 equations, 24 figures, 4 tables.

Figures (24)

  • Figure 1: Function vs. Concept Vectors. Top: Similarity matrices for $\mathcal{FV}$s (left) and $\mathcal{CV}$s (right) in Llama 3.1 70B; cells show how similar two prompt representations are (warmer = more similar). Middle: Schematic highlighting the distinction between heads with causal effect (Activation Patching-selected) and heads that encode format-invariant structure (RSA-selected). Bottom: Example prompts for two concepts across three formats (EN open‑ended, FR open‑ended, multiple‑choice). Takeaway: $\mathcal{FV}$s cluster by input format; $\mathcal{CV}$s cluster by concept across formats.
  • Figure 2: Representational Similarity Analysis (RSA). For each attention head, we compute a representational similarity matrix (RSM) over prompts spanning concepts and input formats (cosine similarity of head outputs). We construct a binary design matrix that marks pairs sharing the same concept, independent of format. The RSA score for a head is Spearman's $\rho$ between the lower‑triangular entries of the RSM and the design matrix; higher $\rho$ indicates stronger concept‑invariant encoding.
  • Figure 3: Similarity matrices. Full similarity matrices extracted from top $K = 5$ heads in $\mathcal{CV}$s and $\mathcal{FV}$s in Llama 3.1 70B for all concepts. See Appendix \ref{['sim_mats_all']} for other models.
  • Figure 4: Concept vs. format RSA. Question type and Concept RSA scores for $\mathcal{CV}$s and $\mathcal{FV}$s in all models. Takeaway: $\mathcal{CV}$s encode more concept information and less input format than $\mathcal{FV}$s.
  • Figure 5: Layer‑wise AIE vs. RSA. AIE and RSA scores averaged across all heads per layer. Takeaway: $\mathcal{FV}$ and $\mathcal{CV}$ heads are in similar layers.
  • ...and 19 more figures