Table of Contents
Fetching ...

Mechanistic Interpretability of Cognitive Complexity in LLMs via Linear Probing using Bloom's Taxonomy

Bianca Raimondi, Maurizio Gabbrielli

TL;DR

This study investigates the internal neural representations of cognitive complexity using Bloom's Taxonomy as a hierarchical lens, providing strong evidence that cognitive level is encoded in a linearly accessible subspace of the model's representations.

Abstract

The black-box nature of Large Language Models necessitates novel evaluation frameworks that transcend surface-level performance metrics. This study investigates the internal neural representations of cognitive complexity using Bloom's Taxonomy as a hierarchical lens. By analyzing high-dimensional activation vectors from different LLMs, we probe whether different cognitive levels, ranging from basic recall (Remember) to abstract synthesis (Create), are linearly separable within the model's residual streams. Our results demonstrate that linear classifiers achieve approximately 95% mean accuracy across all Bloom levels, providing strong evidence that cognitive level is encoded in a linearly accessible subspace of the model's representations. These findings provide evidence that the model resolves the cognitive difficulty of a prompt early in the forward pass, with representations becoming increasingly separable across layers.

Mechanistic Interpretability of Cognitive Complexity in LLMs via Linear Probing using Bloom's Taxonomy

TL;DR

This study investigates the internal neural representations of cognitive complexity using Bloom's Taxonomy as a hierarchical lens, providing strong evidence that cognitive level is encoded in a linearly accessible subspace of the model's representations.

Abstract

The black-box nature of Large Language Models necessitates novel evaluation frameworks that transcend surface-level performance metrics. This study investigates the internal neural representations of cognitive complexity using Bloom's Taxonomy as a hierarchical lens. By analyzing high-dimensional activation vectors from different LLMs, we probe whether different cognitive levels, ranging from basic recall (Remember) to abstract synthesis (Create), are linearly separable within the model's residual streams. Our results demonstrate that linear classifiers achieve approximately 95% mean accuracy across all Bloom levels, providing strong evidence that cognitive level is encoded in a linearly accessible subspace of the model's representations. These findings provide evidence that the model resolves the cognitive difficulty of a prompt early in the forward pass, with representations becoming increasingly separable across layers.
Paper Structure (19 sections, 5 equations, 17 figures, 3 tables)

This paper contains 19 sections, 5 equations, 17 figures, 3 tables.

Figures (17)

  • Figure 1: Overview of the experimental pipeline, from dataset construction and activation extraction to layer-wise linear probing.
  • Figure 2: Layer-wise probe accuracy across all evaluated models.
  • Figure 3: Probe accuracy at layer 5 across Bloom levels for four models. All architectures exhibit consistently high performance, with minor variations reflecting the cognitive difficulty.
  • Figure 4: Representative confusion matrix of the linear probe for Llama-3.1-8B-Instruct at layer 5.
  • Figure 5: Figure 5: Layer-wise Euclidean distances between adjacent Bloom-level centroids for the Llama-3.1-8B-Instruct model. Distances are small in early layers and increase monotonically with depth, indicating progressive geometric disentanglement of cognitive levels. The CSO layer $l^\star=5$ marks the onset of rapid separation.
  • ...and 12 more figures