Table of Contents
Fetching ...

Training-Driven Representational Geometry Modularization Predicts Brain Alignment in Language Models

Yixuan Liu, Zhiyuan Ma, Likai Tang, Runmin Gan, Xinche Zhang, Jinhao Li, Chao Xie, Sen Song

TL;DR

This work investigates how training reshapes the internal geometry of Transformer representations and how these geometric changes relate to brain alignment during language processing. By tracking entropy $E^{(l,s)}$ and curvature $C^{(l,s)}$ across Pythia layers and checkpoints, the authors reveal a stable modularization into low- and high-complexity layers, with the low-complexity module consistently yielding higher fMRI encoding scores across the left-language network. Crucially, curvature emerges as a robust predictor of brain alignment—even after accounting for training progress—and this curvature–alignment coupling strengthens with model scale. The findings suggest that training-driven representational smoothing facilitates neural-like processing and that geometry offers a mechanistic, scalable lens to understand model–brain alignment beyond traditional linguistic features.

Abstract

How large language models (LLMs) align with the neural representation and computation of human language is a central question in cognitive science. Using representational geometry as a mechanistic lens, we addressed this by tracking entropy, curvature, and fMRI encoding scores throughout Pythia (70M-1B) training. We identified a geometric modularization where layers self-organize into stable low- and high-complexity clusters. The low-complexity module, characterized by reduced entropy and curvature, consistently better predicted human language network activity. This alignment followed heterogeneous spatial-temporal trajectories: rapid and stable in temporal regions (AntTemp, PostTemp), but delayed and dynamic in frontal areas (IFG, IFGorb). Crucially, reduced curvature remained a robust predictor of model-brain alignment even after controlling for training progress, an effect that strengthened with model scale. These results links training-driven geometric reorganization to temporal-frontal functional specialization, suggesting that representational smoothing facilitates neural-like linguistic processing.

Training-Driven Representational Geometry Modularization Predicts Brain Alignment in Language Models

TL;DR

This work investigates how training reshapes the internal geometry of Transformer representations and how these geometric changes relate to brain alignment during language processing. By tracking entropy and curvature across Pythia layers and checkpoints, the authors reveal a stable modularization into low- and high-complexity layers, with the low-complexity module consistently yielding higher fMRI encoding scores across the left-language network. Crucially, curvature emerges as a robust predictor of brain alignment—even after accounting for training progress—and this curvature–alignment coupling strengthens with model scale. The findings suggest that training-driven representational smoothing facilitates neural-like processing and that geometry offers a mechanistic, scalable lens to understand model–brain alignment beyond traditional linguistic features.

Abstract

How large language models (LLMs) align with the neural representation and computation of human language is a central question in cognitive science. Using representational geometry as a mechanistic lens, we addressed this by tracking entropy, curvature, and fMRI encoding scores throughout Pythia (70M-1B) training. We identified a geometric modularization where layers self-organize into stable low- and high-complexity clusters. The low-complexity module, characterized by reduced entropy and curvature, consistently better predicted human language network activity. This alignment followed heterogeneous spatial-temporal trajectories: rapid and stable in temporal regions (AntTemp, PostTemp), but delayed and dynamic in frontal areas (IFG, IFGorb). Crucially, reduced curvature remained a robust predictor of model-brain alignment even after controlling for training progress, an effect that strengthened with model scale. These results links training-driven geometric reorganization to temporal-frontal functional specialization, suggesting that representational smoothing facilitates neural-like linguistic processing.
Paper Structure (33 sections, 4 equations, 4 figures, 1 table)

This paper contains 33 sections, 4 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Overview of the experimental framework. We employ the TUCKUTE2024 dataset (1,000 sentences) to obtain both Pythia layer activations and human fMRI responses. Layer activations are mapped to subject-averaged fMRI data using ridge regression (5-fold CV). In parallel, we track representational geometry (entropy $E$ and curvature $C$) across training checkpoints to cluster model layers into modules, analyzing the relationship between these geometric module trajectories and fMRI predictability (Pearson correlation $r$).
  • Figure 2: Geometric evolution and modularization of Pythia-1B layers. (a) Heatmaps of entropy (top) and curvature (bottom). The horizontal axis represents training checkpoints (log-scale, steps 1–143k), and the vertical axis corresponds to layers (0: embeddings, 1–16: Transformer blocks). Darker colors indicate lower values. (b) Aggregated geometry trajectories of the two identified layer clusters. The low-complexity module (blue) develops a stable low-entropy/curvature profile, while the high-complexity module (red) maintains higher values. The orange line shows the difference between the two modules.
  • Figure 3: fMRI alignment time course by geometry module.(a) Mean cross-validated fMRI encoding scores (Pearson correlation) for the low-complexity module (blue) and high-complexity module (red) across six language ROIs over training checkpoints. (b) Module gap $\Delta F$ over training (blue line: mean across layers; shaded: SD across layers at each checkpoint). The dotted line denotes zero. Temporal ROIs show an earlier, more stable advantage for the low-complexity module, whereas frontal ROIs exhibit a more gradual and dynamic separation.
  • Figure 4: Geometric co-evolution and conditional coupling with brain alignment.(a) Scatter plots relating mean geometric metrics (Curvature/Entropy, averaged across layers) within low-complexity module to fMRI encoding scores across checkpoints ($n=19$). (b) Standardized regression coefficients ($\beta_G$) and 95% CIs from $fMRI \sim G + \log(t) + \alpha_l$, where $G$ is z-scored Curvature or Entropy and $\alpha_l$ denotes layer fixed effects. Robust SEs are clustered by checkpoint; significance is based on BH--FDR correction across 6 ROIs for each metric ($q<0.05$).