Table of Contents
Fetching ...

Unveiling LLMs: The Evolution of Latent Representations in a Dynamic Knowledge Graph

Marco Bronzini, Carlo Nicolini, Bruno Lepri, Jacopo Staiano, Andrea Passerini

TL;DR

The paper tackles how LLMs internally encode factual knowledge for claim verification and introduces an end-to-end framework that decodes this latent information into ground predicates using activation patching, then represents the results as a dynamic knowledge graph that evolves across model layers. By applying this approach to FEVER and CLIMATE-FEVER with a 7B LLaMA2 model, the authors demonstrate both local interpretability (entity centrality and multi-hop reasoning) and global interpretability (layer-wise evolution patterns and transaction points). The key contributions are: (i) an activation-patching pipeline that converts token representations into structured facts without training, (ii) a graph-based representation capturing the temporal evolution of knowledge, and (iii) analyses revealing how factual information shifts from word-level to claim-level facts and how representation errors can lead to incorrect evaluations. These insights advance mechanistic interpretability and offer a framework for diagnosing and understanding the factual knowledge resolution process in LLMs with practical implications for bias and reliability.

Abstract

Large Language Models (LLMs) demonstrate an impressive capacity to recall a vast range of factual knowledge. However, understanding their underlying reasoning and internal mechanisms in exploiting this knowledge remains a key research area. This work unveils the factual information an LLM represents internally for sentence-level claim verification. We propose an end-to-end framework to decode factual knowledge embedded in token representations from a vector space to a set of ground predicates, showing its layer-wise evolution using a dynamic knowledge graph. Our framework employs activation patching, a vector-level technique that alters a token representation during inference, to extract encoded knowledge. Accordingly, we neither rely on training nor external models. Using factual and common-sense claims from two claim verification datasets, we showcase interpretability analyses at local and global levels. The local analysis highlights entity centrality in LLM reasoning, from claim-related information and multi-hop reasoning to representation errors causing erroneous evaluation. On the other hand, the global reveals trends in the underlying evolution, such as word-based knowledge evolving into claim-related facts. By interpreting semantics from LLM latent representations and enabling graph-related analyses, this work enhances the understanding of the factual knowledge resolution process.

Unveiling LLMs: The Evolution of Latent Representations in a Dynamic Knowledge Graph

TL;DR

The paper tackles how LLMs internally encode factual knowledge for claim verification and introduces an end-to-end framework that decodes this latent information into ground predicates using activation patching, then represents the results as a dynamic knowledge graph that evolves across model layers. By applying this approach to FEVER and CLIMATE-FEVER with a 7B LLaMA2 model, the authors demonstrate both local interpretability (entity centrality and multi-hop reasoning) and global interpretability (layer-wise evolution patterns and transaction points). The key contributions are: (i) an activation-patching pipeline that converts token representations into structured facts without training, (ii) a graph-based representation capturing the temporal evolution of knowledge, and (iii) analyses revealing how factual information shifts from word-level to claim-level facts and how representation errors can lead to incorrect evaluations. These insights advance mechanistic interpretability and offer a framework for diagnosing and understanding the factual knowledge resolution process in LLMs with practical implications for bias and reliability.

Abstract

Large Language Models (LLMs) demonstrate an impressive capacity to recall a vast range of factual knowledge. However, understanding their underlying reasoning and internal mechanisms in exploiting this knowledge remains a key research area. This work unveils the factual information an LLM represents internally for sentence-level claim verification. We propose an end-to-end framework to decode factual knowledge embedded in token representations from a vector space to a set of ground predicates, showing its layer-wise evolution using a dynamic knowledge graph. Our framework employs activation patching, a vector-level technique that alters a token representation during inference, to extract encoded knowledge. Accordingly, we neither rely on training nor external models. Using factual and common-sense claims from two claim verification datasets, we showcase interpretability analyses at local and global levels. The local analysis highlights entity centrality in LLM reasoning, from claim-related information and multi-hop reasoning to representation errors causing erroneous evaluation. On the other hand, the global reveals trends in the underlying evolution, such as word-based knowledge evolving into claim-related facts. By interpreting semantics from LLM latent representations and enabling graph-related analyses, this work enhances the understanding of the factual knowledge resolution process.
Paper Structure (27 sections, 3 equations, 14 figures, 4 tables)

This paper contains 27 sections, 3 equations, 14 figures, 4 tables.

Figures (14)

  • Figure 1: Illustrative insights from unveiling the process of factual knowledge resolution within an LLM using the proposed patching-based framework.
  • Figure 2: The patching-based framework decodes the factual knowledge from LLM latent representations. The outputs are represented in a dynamic knowledge graph.
  • Figure 3: Inference of $\mathcal{M}$ on the source prompt $\mathcal{S}$.
  • Figure 4: Patching operation during the inference of model $\mathcal{M}$ on the target prompt $\mathcal{T}$.
  • Figure 5: Example of the binary weights for the input's tokens $\mathcal{I} \subset \mathcal{S}$. These weights are then used to combine the tokens' vector representations via a weighted sum.
  • ...and 9 more figures