Information-Theoretic Progress Measures reveal Grokking is an Emergent Phase Transition

Kenzo Clauw; Sebastiano Stramaglia; Daniele Marinazzo

Information-Theoretic Progress Measures reveal Grokking is an Emergent Phase Transition

Kenzo Clauw, Sebastiano Stramaglia, Daniele Marinazzo

TL;DR

This work tackles grokking, the abrupt generalization of neural networks after extended memorization, by applying higher-order information theory to quantify inter-neuronal interactions. Using the multivariate measure $\Omega_{n}$, defined as $\Omega_{n}(\mathbf{Z}) = (n - 2)H(\mathbf{Z}) + \sum_{j=1}^{n} [H(Z_{j}) - H(\mathbf{Z} \backslash Z_{j})]$, the authors distinguish synergy and redundancy among neuron activations and show that grokking corresponds to an emergent phase transition with three phases: Feature Learning, Emergence, and Decoupling. They demonstrate that weight decay and initialization modulate the emergence, with early synergy peaks offering predictive power for grokking. The study highlights a shift toward emergentism in interpretability and points to limitations due to the toy setup and computational costs of higher-order information estimates, outlining directions for scaling and stronger causal validation. Overall, the paper provides a framework to diagnose and predict grokking through higher-order interactions and contributes to understanding how collective neuronal dynamics drive generalization.

Abstract

This paper studies emergent phenomena in neural networks by focusing on grokking where models suddenly generalize after delayed memorization. To understand this phase transition, we utilize higher-order mutual information to analyze the collective behavior (synergy) and shared properties (redundancy) between neurons during training. We identify distinct phases before grokking allowing us to anticipate when it occurs. We attribute grokking to an emergent phase transition caused by the synergistic interactions between neurons as a whole. We show that weight decay and weight initialization can enhance the emergent phase.

Information-Theoretic Progress Measures reveal Grokking is an Emergent Phase Transition

TL;DR

, defined as

, the authors distinguish synergy and redundancy among neuron activations and show that grokking corresponds to an emergent phase transition with three phases: Feature Learning, Emergence, and Decoupling. They demonstrate that weight decay and initialization modulate the emergence, with early synergy peaks offering predictive power for grokking. The study highlights a shift toward emergentism in interpretability and points to limitations due to the toy setup and computational costs of higher-order information estimates, outlining directions for scaling and stronger causal validation. Overall, the paper provides a framework to diagnose and predict grokking through higher-order interactions and contributes to understanding how collective neuronal dynamics drive generalization.

Abstract

Paper Structure (15 sections, 1 equation, 9 figures)

This paper contains 15 sections, 1 equation, 9 figures.

Introduction
Related Work
Methodology
Evaluating grokking
Grokking is an emergent phase transition
Low weight decay delays the emergence phase
Increasing weight decay enhances emergence
The role of weight initialization
Early synergy peaks predict grokking
Are the emergent synergistic sub-networks causally related to generalization?
Limitations
Conclusion and Discussion
Author Contributions
Training results for baseline model with weight decay
Training results for baseline model with weight initialization alpha

Figures (9)

Figure 1: Left: accuracy for weight decay 0.1 and 2.0, Right: accuracy for alpha initialization 8.
Figure 2: Baseline model with weight decay 0.1
Figure 3: Baseline model with weight decay 2.0
Figure 4: Baseline model with alpha = 8 and weight decay = 0
Figure 5: Left: synergy for weight decay (0.1, 2, 10, 50) Right: synergy for alpha (1, 8, 50)
...and 4 more figures

Information-Theoretic Progress Measures reveal Grokking is an Emergent Phase Transition

TL;DR

Abstract

Information-Theoretic Progress Measures reveal Grokking is an Emergent Phase Transition

Authors

TL;DR

Abstract

Table of Contents

Figures (9)