Table of Contents
Fetching ...

Triple Phase Transitions: Understanding the Learning Dynamics of Large Language Models from a Neuroscience Perspective

Yuko Nakagi, Keigo Tada, Sota Yoshino, Shinji Nishimoto, Yu Takagi

TL;DR

The paper addresses why abrupt emergent abilities arise in large language models during training. It develops a triple-perspective framework—brain encoding alignment, probing of internal representations, and downstream benchmarking—to map learning dynamics across models and languages. The key finding is a consistent three-phase progression (Brain Alignment and Instruction Following; Brain Detachment and Stagnation; Brain Realignment and Consolidation) that co-occurs with shifts in internal representations and downstream performance, with language coverage shaping the timing and expression of these phases. This work demonstrates that brain activity can serve as a concrete, biologically grounded benchmark to guide the safe, interpretable development of future LLMs and highlights avenues for integrating neuroscience insights into AI evaluation.

Abstract

Large language models (LLMs) often exhibit abrupt emergent behavior, whereby new abilities arise at certain points during their training. This phenomenon, commonly referred to as a ''phase transition'', remains poorly understood. In this study, we conduct an integrative analysis of such phase transitions by examining three interconnected perspectives: the similarity between LLMs and the human brain, the internal states of LLMs, and downstream task performance. We propose a novel interpretation for the learning dynamics of LLMs that vary in both training data and architecture, revealing that three phase transitions commonly emerge across these models during training: (1) alignment with the entire brain surges as LLMs begin adhering to task instructions Brain Alignment and Instruction Following, (2) unexpectedly, LLMs diverge from the brain during a period in which downstream task accuracy temporarily stagnates Brain Detachment and Stagnation, and (3) alignment with the brain reoccurs as LLMs become capable of solving the downstream tasks Brain Realignment and Consolidation. These findings illuminate the underlying mechanisms of phase transitions in LLMs, while opening new avenues for interdisciplinary research bridging AI and neuroscience.

Triple Phase Transitions: Understanding the Learning Dynamics of Large Language Models from a Neuroscience Perspective

TL;DR

The paper addresses why abrupt emergent abilities arise in large language models during training. It develops a triple-perspective framework—brain encoding alignment, probing of internal representations, and downstream benchmarking—to map learning dynamics across models and languages. The key finding is a consistent three-phase progression (Brain Alignment and Instruction Following; Brain Detachment and Stagnation; Brain Realignment and Consolidation) that co-occurs with shifts in internal representations and downstream performance, with language coverage shaping the timing and expression of these phases. This work demonstrates that brain activity can serve as a concrete, biologically grounded benchmark to guide the safe, interpretable development of future LLMs and highlights avenues for integrating neuroscience insights into AI evaluation.

Abstract

Large language models (LLMs) often exhibit abrupt emergent behavior, whereby new abilities arise at certain points during their training. This phenomenon, commonly referred to as a ''phase transition'', remains poorly understood. In this study, we conduct an integrative analysis of such phase transitions by examining three interconnected perspectives: the similarity between LLMs and the human brain, the internal states of LLMs, and downstream task performance. We propose a novel interpretation for the learning dynamics of LLMs that vary in both training data and architecture, revealing that three phase transitions commonly emerge across these models during training: (1) alignment with the entire brain surges as LLMs begin adhering to task instructions Brain Alignment and Instruction Following, (2) unexpectedly, LLMs diverge from the brain during a period in which downstream task accuracy temporarily stagnates Brain Detachment and Stagnation, and (3) alignment with the brain reoccurs as LLMs become capable of solving the downstream tasks Brain Realignment and Consolidation. These findings illuminate the underlying mechanisms of phase transitions in LLMs, while opening new avenues for interdisciplinary research bridging AI and neuroscience.

Paper Structure

This paper contains 38 sections, 2 equations, 33 figures, 2 tables.

Figures (33)

  • Figure 1: Overview of the study.aBrain encoding analysis. bProbing analysis (top) and benchmark analysis (bottom). c Three phase-transition phenomena during the learning process of LLMs, as identified through the results of the encoding, probing, and benchmark analyses. Red, green, and blue lines indicate encoding, probing, and benchmark accuracies, respectively, across all LLM checkpoints.
  • Figure 2: Learning dynamics of LLMs exhibiting three phase transitions. The horizontal axis denotes the number of training tokens. The vertical axis denotes the average encoding accuracy for all voxels of a single participant (DM06) (red lines), the benchmark accuracy (blue lines), and the average probing accuracy for all LLM neurons calculated using MMLU (green lines). We select layers 25, 30, and 25 from OLMo-2, OLMo-0724, and LLM-jp, respectively, to capture the transitions that occur at each phase of the learning dynamics. The background color indicates the LLM phase. The legend indicates whether the language has been learned sufficiently by the model. No checkpoints preceding the 109 training tokens have been made publicly available aside from LLM-jp.
  • Figure 3: Changes in the relationship with the brain. Differences in encoding accuracy among checkpoints for three participants (DM06, DM03, and DM07), projected onto the inflated (top, lateral, and medial views) and flattened cortical surface (occipital areas are at the center, only for DM06), for both the left and right hemispheres. We have chosen OLMo-2's checkpoints that capture the transitions at each phase of the learning dynamics. Brain regions with significant accuracy at either of the two checkpoints are colored ($p<0.05$, FDR-corrected). Voxels exhibiting higher accuracy at the later checkpoint are indicated in red, whereas those exhibiting higher accuracy at the earlier checkpoint are indicated in blue.
  • Figure 4: Probing results.a Evolution of probing accuracy for English/Japanese MMLU throughout the training process, assessed at layers 5, 15, 25, and 30 of OLMo-2. The horizontal axis indicates the probing accuracy. The vertical axis indicates the number of neurons that fall within each 0.01 accuracy bin. The legends corresponds to the number of training tokens. b Relationship between probing accuracy in the neurons of OLMo-2 (layer 25) across English MMLU, CSQA, and ARC. Each axis denotes the probing accuracy for the respective task, and the color gradient reflects the number of training tokens. The legend shows the correlation coefficient between certain two tasks.
  • Figure 5: The Nature of Activations.a Variations in correlation coefficients of the activations of OLMo-2 across checkpoints. b IDs (purple line) and average encoding accuracy for all voxels of a single participant (DM06) (red line) across checkpoints.
  • ...and 28 more figures