Training-Driven Representational Geometry Modularization Predicts Brain Alignment in Language Models

Yixuan Liu; Zhiyuan Ma; Likai Tang; Runmin Gan; Xinche Zhang; Jinhao Li; Chao Xie; Sen Song

Training-Driven Representational Geometry Modularization Predicts Brain Alignment in Language Models

Yixuan Liu, Zhiyuan Ma, Likai Tang, Runmin Gan, Xinche Zhang, Jinhao Li, Chao Xie, Sen Song

TL;DR

This work investigates how training reshapes the internal geometry of Transformer representations and how these geometric changes relate to brain alignment during language processing. By tracking entropy $E^{(l,s)}$ and curvature $C^{(l,s)}$ across Pythia layers and checkpoints, the authors reveal a stable modularization into low- and high-complexity layers, with the low-complexity module consistently yielding higher fMRI encoding scores across the left-language network. Crucially, curvature emerges as a robust predictor of brain alignment—even after accounting for training progress—and this curvature–alignment coupling strengthens with model scale. The findings suggest that training-driven representational smoothing facilitates neural-like processing and that geometry offers a mechanistic, scalable lens to understand model–brain alignment beyond traditional linguistic features.

Abstract

How large language models (LLMs) align with the neural representation and computation of human language is a central question in cognitive science. Using representational geometry as a mechanistic lens, we addressed this by tracking entropy, curvature, and fMRI encoding scores throughout Pythia (70M-1B) training. We identified a geometric modularization where layers self-organize into stable low- and high-complexity clusters. The low-complexity module, characterized by reduced entropy and curvature, consistently better predicted human language network activity. This alignment followed heterogeneous spatial-temporal trajectories: rapid and stable in temporal regions (AntTemp, PostTemp), but delayed and dynamic in frontal areas (IFG, IFGorb). Crucially, reduced curvature remained a robust predictor of model-brain alignment even after controlling for training progress, an effect that strengthened with model scale. These results links training-driven geometric reorganization to temporal-frontal functional specialization, suggesting that representational smoothing facilitates neural-like linguistic processing.

Training-Driven Representational Geometry Modularization Predicts Brain Alignment in Language Models

TL;DR

and curvature

across Pythia layers and checkpoints, the authors reveal a stable modularization into low- and high-complexity layers, with the low-complexity module consistently yielding higher fMRI encoding scores across the left-language network. Crucially, curvature emerges as a robust predictor of brain alignment—even after accounting for training progress—and this curvature–alignment coupling strengthens with model scale. The findings suggest that training-driven representational smoothing facilitates neural-like processing and that geometry offers a mechanistic, scalable lens to understand model–brain alignment beyond traditional linguistic features.

Abstract

Paper Structure (33 sections, 4 equations, 4 figures, 1 table)

This paper contains 33 sections, 4 equations, 4 figures, 1 table.

Introduction
The Present Study
Methods
Brain Data and Model
Brain Data and Stimuli.
Model.
Representation Geometry Analysis
Entropy.
Curvature.
Layer-level Aggregation.
Clustering and Robustness Analysis
Geometry-Trajectory Features for Clustering.
K-means Clustering and Robustness of Layer Modules.
Geometry--fMRI Coupling Analysis.
Results
...and 18 more sections

Figures (4)

Figure 1: Overview of the experimental framework. We employ the TUCKUTE2024 dataset (1,000 sentences) to obtain both Pythia layer activations and human fMRI responses. Layer activations are mapped to subject-averaged fMRI data using ridge regression (5-fold CV). In parallel, we track representational geometry (entropy $E$ and curvature $C$) across training checkpoints to cluster model layers into modules, analyzing the relationship between these geometric module trajectories and fMRI predictability (Pearson correlation $r$).
Figure 2: Geometric evolution and modularization of Pythia-1B layers. (a) Heatmaps of entropy (top) and curvature (bottom). The horizontal axis represents training checkpoints (log-scale, steps 1–143k), and the vertical axis corresponds to layers (0: embeddings, 1–16: Transformer blocks). Darker colors indicate lower values. (b) Aggregated geometry trajectories of the two identified layer clusters. The low-complexity module (blue) develops a stable low-entropy/curvature profile, while the high-complexity module (red) maintains higher values. The orange line shows the difference between the two modules.
Figure 3: fMRI alignment time course by geometry module.(a) Mean cross-validated fMRI encoding scores (Pearson correlation) for the low-complexity module (blue) and high-complexity module (red) across six language ROIs over training checkpoints. (b) Module gap $\Delta F$ over training (blue line: mean across layers; shaded: SD across layers at each checkpoint). The dotted line denotes zero. Temporal ROIs show an earlier, more stable advantage for the low-complexity module, whereas frontal ROIs exhibit a more gradual and dynamic separation.
Figure 4: Geometric co-evolution and conditional coupling with brain alignment.(a) Scatter plots relating mean geometric metrics (Curvature/Entropy, averaged across layers) within low-complexity module to fMRI encoding scores across checkpoints ($n=19$). (b) Standardized regression coefficients ($\beta_G$) and 95% CIs from $fMRI \sim G + \log(t) + \alpha_l$, where $G$ is z-scored Curvature or Entropy and $\alpha_l$ denotes layer fixed effects. Robust SEs are clustered by checkpoint; significance is based on BH--FDR correction across 6 ROIs for each metric ($q<0.05$).

Training-Driven Representational Geometry Modularization Predicts Brain Alignment in Language Models

TL;DR

Abstract

Training-Driven Representational Geometry Modularization Predicts Brain Alignment in Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (4)