Table of Contents
Fetching ...

MindCraft: How Concept Trees Take Shape In Deep Models

Bowei Tian, Yexiao He, Wanghao Ye, Ziyao Wang, Meng Liu, Ang Li

TL;DR

The MindCraft framework built upon Concept Trees establishes a widely applicable and powerful framework that enables in-depth analysis of conceptual representations in deep models, marking a significant step forward in the foundation of interpretable AI.

Abstract

Large-scale foundation models demonstrate strong performance across language, vision, and reasoning tasks. However, how they internally structure and stabilize concepts remains elusive. Inspired by causal inference, we introduce the MindCraft framework built upon Concept Trees. By applying spectral decomposition at each layer and linking principal directions into branching Concept Paths, Concept Trees reconstruct the hierarchical emergence of concepts, revealing exactly when they diverge from shared representations into linearly separable subspaces. Empirical evaluations across diverse scenarios across disciplines, including medical diagnosis, physics reasoning, and political decision-making, show that Concept Trees recover semantic hierarchies, disentangle latent concepts, and can be widely applied across multiple domains. The Concept Tree establishes a widely applicable and powerful framework that enables in-depth analysis of conceptual representations in deep models, marking a significant step forward in the foundation of interpretable AI.

MindCraft: How Concept Trees Take Shape In Deep Models

TL;DR

The MindCraft framework built upon Concept Trees establishes a widely applicable and powerful framework that enables in-depth analysis of conceptual representations in deep models, marking a significant step forward in the foundation of interpretable AI.

Abstract

Large-scale foundation models demonstrate strong performance across language, vision, and reasoning tasks. However, how they internally structure and stabilize concepts remains elusive. Inspired by causal inference, we introduce the MindCraft framework built upon Concept Trees. By applying spectral decomposition at each layer and linking principal directions into branching Concept Paths, Concept Trees reconstruct the hierarchical emergence of concepts, revealing exactly when they diverge from shared representations into linearly separable subspaces. Empirical evaluations across diverse scenarios across disciplines, including medical diagnosis, physics reasoning, and political decision-making, show that Concept Trees recover semantic hierarchies, disentangle latent concepts, and can be widely applied across multiple domains. The Concept Tree establishes a widely applicable and powerful framework that enables in-depth analysis of conceptual representations in deep models, marking a significant step forward in the foundation of interpretable AI.

Paper Structure

This paper contains 37 sections, 13 equations, 10 figures.

Figures (10)

  • Figure 1: Overview of the MindCraft algorithm, where $X$ is the input sequence, for layer $N$, $H_N$ is the attention output sequence after residual connection, $A_N$ is the attention weight, $V_N$ is the value matrix. We first perform an intervention on a specific input token. Then, leveraging the attention mechanism, we compare between the counterfactual and the original representation at the last token. This difference reveals the hierarchical layer at which concepts separate within the model.
  • Figure 2: $\cos(V_{L(\Delta x)}^{(-1)},V_L^{(-1)})$ change, where a sudden amplification of the conceptual difference is observed.
  • Figure 3: Concept Trees constructed from six scenarios: (a–b) medical diagnosis, (c) daily life, (d) personality evaluation, (e) physics reasoning, and (f) political decision-making. Counterfactual tokens are marked with underlines, and n denotes the number of remaining unbranched concepts. The results demonstrate the broad applicability of the Concept Tree, revealing a structured and hierarchical organization of conceptual reasoning within the model.
  • Figure 4: Concept Highlight Experiment. This experiment highlights Concept Tree captures semantic interpretability rather than relying solely on static measures such as input or latent embeddings.
  • Figure 5: Layer-wise analysis of attention weights, value vectors, and representations, where Counterfactual Pair 1 is "Pretend you're an honest (untruthful) person making statements about the world.", Counterfactual Pair 2 is " Describe a fair (biased) scenario that you have seen.", and Counterfactual Pair 3 is "You are a powerful (powerless) leader making decisions.", the consistent propagation patterns observed across tasks shows that concept formation follows a robust hierarchical dynamic within deep networks.
  • ...and 5 more figures

Theorems & Definitions (2)

  • Definition 1: Concept Path
  • Definition 2: Concept Tree