Table of Contents
Fetching ...

Locating and Extracting Relational Concepts in Large Language Models

Zijian Wang, Britney White, Chang Xu

TL;DR

This work addresses the interpretability gap of relational concepts in large language models by showing that relational representations can be located in hidden states via causal mediation analysis, particularly in the Relational Emergence stage of the last-token processing. The authors validate these representations through hidden-states transplantation and zero-shot relational reasoning, demonstrating that the extracted relational constructs act as faithful, robust entity connectors and can be used to controllably rewrite model outputs through relation rewriting. Their approach does not require parameter updates and provides a practical mechanism for steering fact recall and open-ended generation, with potential impact on explainability and controllable AI systems. Overall, the paper advances understanding of internal knowledge representations in LLMs and offers a method for manipulating relational recall at inference time for improved reliability and control.

Abstract

Relational concepts are indeed foundational to the structure of knowledge representation, as they facilitate the association between various entity concepts, allowing us to express and comprehend complex world knowledge. By expressing relational concepts in natural language prompts, people can effortlessly interact with large language models (LLMs) and recall desired factual knowledge. However, the process of knowledge recall lacks interpretability, and representations of relational concepts within LLMs remain unknown to us. In this paper, we identify hidden states that can express entity and relational concepts through causal mediation analysis in fact recall processes. Our finding reveals that at the last token position of the input prompt, there are hidden states that solely express the causal effects of relational concepts. Based on this finding, we assume that these hidden states can be treated as relational representations and we can successfully extract them from LLMs. The experimental results demonstrate high credibility of the relational representations: they can be flexibly transplanted into other fact recall processes, and can also be used as robust entity connectors. Moreover, we also show that the relational representations exhibit significant potential for controllable fact recall through relation rewriting.

Locating and Extracting Relational Concepts in Large Language Models

TL;DR

This work addresses the interpretability gap of relational concepts in large language models by showing that relational representations can be located in hidden states via causal mediation analysis, particularly in the Relational Emergence stage of the last-token processing. The authors validate these representations through hidden-states transplantation and zero-shot relational reasoning, demonstrating that the extracted relational constructs act as faithful, robust entity connectors and can be used to controllably rewrite model outputs through relation rewriting. Their approach does not require parameter updates and provides a practical mechanism for steering fact recall and open-ended generation, with potential impact on explainability and controllable AI systems. Overall, the paper advances understanding of internal knowledge representations in LLMs and offers a method for manipulating relational recall at inference time for improved reliability and control.

Abstract

Relational concepts are indeed foundational to the structure of knowledge representation, as they facilitate the association between various entity concepts, allowing us to express and comprehend complex world knowledge. By expressing relational concepts in natural language prompts, people can effortlessly interact with large language models (LLMs) and recall desired factual knowledge. However, the process of knowledge recall lacks interpretability, and representations of relational concepts within LLMs remain unknown to us. In this paper, we identify hidden states that can express entity and relational concepts through causal mediation analysis in fact recall processes. Our finding reveals that at the last token position of the input prompt, there are hidden states that solely express the causal effects of relational concepts. Based on this finding, we assume that these hidden states can be treated as relational representations and we can successfully extract them from LLMs. The experimental results demonstrate high credibility of the relational representations: they can be flexibly transplanted into other fact recall processes, and can also be used as robust entity connectors. Moreover, we also show that the relational representations exhibit significant potential for controllable fact recall through relation rewriting.
Paper Structure (40 sections, 2 equations, 11 figures, 13 tables)

This paper contains 40 sections, 2 equations, 11 figures, 13 tables.

Figures (11)

  • Figure 1: Our motivating observation in a fact recall process. At the last position, only hidden states in shallow layers solely express the relational causal effect, which provides us inspiration to treat these hidden states as relational representations.
  • Figure 2: The Mediating Effect Visualization. We take "Given banana, the color of this fruit is" as an example for illustration. We observe that at the last position, the mediating effect of the relation solely emerges in shallow layers, and then the mediating effect of the subject emerges in deep layers.
  • Figure 3: The average and variance area plot of the mediating effects of relations and subjects at the last position, across all relation types. We divide all layers into three stages to describe different patterns of causal effects of subjects and relations
  • Figure 4: The illustration of hidden States transplantation, which includes a sliding pointer to dynamically indicate the layer range.
  • Figure 5: The prediction rank reciprocal of target and reference objects. We select 4 relation types for illustration. Red circles denote successful predictions of target objects, while black circles denote successful predictions of reference objects. The colorful lines represent 5 target object predictions, and the black line represents the reference object prediction.
  • ...and 6 more figures