Table of Contents
Fetching ...

Functional Abstraction of Knowledge Recall in Large Language Models

Zijian Wang, Chang Xu

TL;DR

The paper proposes that knowledge recall in large language models can be understood as a functional process, with activation vectors serving as input arguments, function bodies, and return values. It introduces a activation patching–driven framework to identify subject, relation, and object representations and validates their independent roles via counter-knowledge testing and vector interchange, grounded in causal mediation analysis. The authors then leverage this functional insight to improve contextual knowledge editing through targeted activation patches, enabling more reliable short-term memory updates for new facts. Overall, the work demonstrates localized, stage-wise encoding of knowledge and presents a promising path for both interpretability and rapid, non-parametric knowledge editing in LLMs.

Abstract

Pre-trained transformer large language models (LLMs) demonstrate strong knowledge recall capabilities. This paper investigates the knowledge recall mechanism in LLMs by abstracting it into a functional structure. We propose that during knowledge recall, the model's hidden activation space implicitly entails a function execution process where specific activation vectors align with functional components (Input argument, Function body, and Return values). Specifically, activation vectors of relation-related tokens define a mapping function from subjects to objects, with subject-related token activations serving as input arguments and object-related token activations as return values. For experimental verification, we first design a patching-based knowledge-scoring algorithm to identify knowledge-aware activation vectors as independent functional components. Then, we conduct counter-knowledge testing to examine the independent functional effects of each component on knowledge recall outcomes. From this functional perspective, we improve the contextual knowledge editing approach augmented by activation patching. By rewriting incoherent activations in context, we enable improved short-term memory retention for new knowledge prompting.

Functional Abstraction of Knowledge Recall in Large Language Models

TL;DR

The paper proposes that knowledge recall in large language models can be understood as a functional process, with activation vectors serving as input arguments, function bodies, and return values. It introduces a activation patching–driven framework to identify subject, relation, and object representations and validates their independent roles via counter-knowledge testing and vector interchange, grounded in causal mediation analysis. The authors then leverage this functional insight to improve contextual knowledge editing through targeted activation patches, enabling more reliable short-term memory updates for new facts. Overall, the work demonstrates localized, stage-wise encoding of knowledge and presents a promising path for both interpretability and rapid, non-parametric knowledge editing in LLMs.

Abstract

Pre-trained transformer large language models (LLMs) demonstrate strong knowledge recall capabilities. This paper investigates the knowledge recall mechanism in LLMs by abstracting it into a functional structure. We propose that during knowledge recall, the model's hidden activation space implicitly entails a function execution process where specific activation vectors align with functional components (Input argument, Function body, and Return values). Specifically, activation vectors of relation-related tokens define a mapping function from subjects to objects, with subject-related token activations serving as input arguments and object-related token activations as return values. For experimental verification, we first design a patching-based knowledge-scoring algorithm to identify knowledge-aware activation vectors as independent functional components. Then, we conduct counter-knowledge testing to examine the independent functional effects of each component on knowledge recall outcomes. From this functional perspective, we improve the contextual knowledge editing approach augmented by activation patching. By rewriting incoherent activations in context, we enable improved short-term memory retention for new knowledge prompting.

Paper Structure

This paper contains 44 sections, 2 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Illustration of our abstraction framework. In a knowledge recall process, i.e., (subject, relation)$\rightarrow$object, we find that knowledge-related representations are locally distributed, and they are well-aligned with functional components.
  • Figure 2: Illustration of two activation patching operations. In (a), we get the corrupted activations by adding noise to embeddings and then patch the corrupted ones with clean activations. In (b), we patch the source knowledge with the activations from the reference knowledge.
  • Figure 3: Illustration of the counter-knowledge testing experiments. If the prediction of the object after the interchange is affected only by the exchanged knowledge, it can be proved that each knowledge vector acts as a functional component independently.
  • Figure 4: Scores heat map visualization. The left column uses the template "Given <subject>, the <relation> is", and the right column uses "The <relation> of <subject> is".
  • Figure 5: Locality along token positions. This bar graph shows the proportion of high-score (>0.05) activation vectors in different token ranges.
  • ...and 3 more figures