Table of Contents
Fetching ...

Analogical Reasoning Inside Large Language Models: Concept Vectors and the Limits of Abstraction

Gustaw Opiełka, Hannes Rosenbusch, Claire E. Stevenson

TL;DR

This work formalizes abstraction as invariance and investigates whether large language models (LLMs) manifest internal abstract representations. Using function vectors (FVs) and representational similarity analysis (RSA), the authors show FVs are not invariant to low-level input changes and tend to encode multiple task attributes, while verbal concept vectors (CVs) emerge as invariant, concept-specific detectors that can causally influence behavior. CVs reliably capture verbal concepts but fail to produce invariant representations for abstract concepts like 'previous' and 'next', suggesting current LLMs generalize poorly to new domains that require abstract relational reasoning. The findings imply internal knowledge in LLMs is context-dependent and not grounded in reusable abstract concepts, highlighting limits to analogical reasoning and guiding future work on fostering true abstraction in AI systems.

Abstract

Analogical reasoning relies on conceptual abstractions, but it is unclear whether Large Language Models (LLMs) harbor such internal representations. We explore distilled representations from LLM activations and find that function vectors (FVs; Todd et al., 2024) - compact representations for in-context learning (ICL) tasks - are not invariant to simple input changes (e.g., open-ended vs. multiple-choice), suggesting they capture more than pure concepts. Using representational similarity analysis (RSA), we localize a small set of attention heads that encode invariant concept vectors (CVs) for verbal concepts like "antonym". These CVs function as feature detectors that operate independently of the final output - meaning that a model may form a correct internal representation yet still produce an incorrect output. Furthermore, CVs can be used to causally guide model behaviour. However, for more abstract concepts like "previous" and "next", we do not observe invariant linear representations, a finding we link to generalizability issues LLMs display within these domains.

Analogical Reasoning Inside Large Language Models: Concept Vectors and the Limits of Abstraction

TL;DR

This work formalizes abstraction as invariance and investigates whether large language models (LLMs) manifest internal abstract representations. Using function vectors (FVs) and representational similarity analysis (RSA), the authors show FVs are not invariant to low-level input changes and tend to encode multiple task attributes, while verbal concept vectors (CVs) emerge as invariant, concept-specific detectors that can causally influence behavior. CVs reliably capture verbal concepts but fail to produce invariant representations for abstract concepts like 'previous' and 'next', suggesting current LLMs generalize poorly to new domains that require abstract relational reasoning. The findings imply internal knowledge in LLMs is context-dependent and not grounded in reusable abstract concepts, highlighting limits to analogical reasoning and guiding future work on fostering true abstraction in AI systems.

Abstract

Analogical reasoning relies on conceptual abstractions, but it is unclear whether Large Language Models (LLMs) harbor such internal representations. We explore distilled representations from LLM activations and find that function vectors (FVs; Todd et al., 2024) - compact representations for in-context learning (ICL) tasks - are not invariant to simple input changes (e.g., open-ended vs. multiple-choice), suggesting they capture more than pure concepts. Using representational similarity analysis (RSA), we localize a small set of attention heads that encode invariant concept vectors (CVs) for verbal concepts like "antonym". These CVs function as feature detectors that operate independently of the final output - meaning that a model may form a correct internal representation yet still produce an incorrect output. Furthermore, CVs can be used to causally guide model behaviour. However, for more abstract concepts like "previous" and "next", we do not observe invariant linear representations, a finding we link to generalizability issues LLMs display within these domains.

Paper Structure

This paper contains 22 sections, 5 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: Pairwise similarity matrix of $\mathcal{CV}$ s extracted from Llama-3.1 70B across 600 ICL prompts covering various concepts and low-level presentations. $\mathcal{CV}$ s remain invariant for the verbal concepts antonym and category, but show no stable representation of abstract concepts like previous or next. Instead, these tasks exhibit order-based representations tied to known lists (e.g., alphabets, weekdays) or low-level clustering based on presentation format (words vs. letters).
  • Figure 2: Representational similarity matrices for antonym and categorical concepts each tested with three low-level transformations. The upper-left and lower-right quadrants (outlined with the dashed lines) contain pairwise similarity scores for prompts coming from the same concept. $\mathcal{CV}$ s encode the concept in a more invariant manner than $\mathcal{FV}$ s.
  • Figure 3: Density plot displaying the information-rich make-up of 100 attention heads in LLama 70B comprising its $\mathcal{FV}$.
  • Figure 4: Patching activations from multiple low-level manifestations of a latent concept does not change which attention heads are ranked to have the highest causal effect nor does it help localize latent conceptual information.
  • Figure 5: Attention heads encoding verbal concepts emerge in early-to-mid layers.
  • ...and 6 more figures