GenEFT: Understanding Statics and Dynamics of Model Generalization via Effective Theory

David D. Baek; Ziming Liu; Max Tegmark

GenEFT: Understanding Statics and Dynamics of Model Generalization via Effective Theory

David D. Baek, Ziming Liu, Max Tegmark

TL;DR

GenEFT addresses the statics and dynamics of neural-network generalization by combining an information-theoretic bound on required data with a dynamic Interacting Repon Theory that ties encoder/decoder learning-rate balance to memorization-generalization phase transitions. The static component centers on a description-length bound $b=\log_2 \frac{n!}{|\operatorname{Aut}(G)|}$ and a crude critical-data fraction $p_c$, while the dynamic component introduces repons as interacting representations, yielding Theorems 1–3 and a practical bound on generalizable learning $p_r$. The framework is validated on knowledge-graph autoencoding tasks with $n=30$ across multiple relations, revealing a Goldilocks decoder regime and phase diagrams that align with theory. By linking data geometry and learning dynamics through a physics-inspired lens, GenEFT offers a principled pathway to bridge theory and practice in model generalization and informs data and learning-rate decisions in graph-based settings.

Abstract

We present GenEFT: an effective theory framework for shedding light on the statics and dynamics of neural network generalization, and illustrate it with graph learning examples. We first investigate the generalization phase transition as data size increases, comparing experimental results with information-theory-based approximations. We find generalization in a Goldilocks zone where the decoder is neither too weak nor too powerful. We then introduce an effective theory for the dynamics of representation learning, where latent-space representations are modeled as interacting particles (repons), and find that it explains our experimentally observed phase transition between generalization and overfitting as encoder and decoder learning rates are scanned. This highlights the power of physics-inspired effective theories for bridging the gap between theoretical predictions and practice in machine learning.

GenEFT: Understanding Statics and Dynamics of Model Generalization via Effective Theory

TL;DR

and a crude critical-data fraction

, while the dynamic component introduces repons as interacting representations, yielding Theorems 1–3 and a practical bound on generalizable learning

. The framework is validated on knowledge-graph autoencoding tasks with

across multiple relations, revealing a Goldilocks decoder regime and phase diagrams that align with theory. By linking data geometry and learning dynamics through a physics-inspired lens, GenEFT offers a principled pathway to bridge theory and practice in model generalization and informs data and learning-rate decisions in graph-based settings.

Abstract

Paper Structure (11 sections, 16 equations, 8 figures, 4 tables)

This paper contains 11 sections, 16 equations, 8 figures, 4 tables.

Introduction
Problem Formulation
Minimal Data Amount for Generalization
Interacting Repon Theory: Critical Learning Rates for Generalization
Conclusion
List of Default Hyperparameter Values
Definition of Phases
Derivation of Eq. (\ref{['eq:gradient_flow']})
Monte Carlo Simulation Details
Optimal Model Complexity for Generalization
Additional Phase Diagrams

Figures (8)

Figure 1: Summary of GenEFT: Physics-inspired effective theory for understanding statics and dynamics of model generalization.
Figure 2: Plot of numerical experiments (solid line) and analytic formula in Eq. \ref{['eff:formula']} (dotted line) for the theoretical upper bound of prediction accuracy $f_{\rm UB}$ as a function of training data fraction.
Figure 3: Prediction accuracy vs. training data fraction for learning the relations (a) equivalent modulo 3, and (b) greater-than. We see that applying inductive bias to the decoder MLP architecture (making it simpler while still capable to succeed) improves the performance (reduces the "induction gap" gap relative to the theoretical upper bound). In the legend, $n_h$ is the number of hidden layers, $w$ is the width of the hidden layers, mode indicates how the two embeddings are combined, and the dimension is that of the embedding space. Vertical lines show the approximate transition scale $b/n^2$.
Figure 4: Phase diagram of interacting repon dynamics in classification problems. The two axes are $a_2$ and $c$, which parameterize the decoder weights and the distance between two representations, respectively. Their formal definitions can be found in the main text. In this phase diagram, the green area represents initializations that lead to representation collision, thereby enabling generalizable representation learning, while the red area corresponds to regions that lead to overfitting or memorization.
Figure 5: Phase diagram as a function of the encoder learning rate $\eta_{\textrm{enc}}$ and decoder learning rate $\eta_{\textrm{dec}}$ for (a) modulo 3, (b) greater-than, and (c) complete bipartite relations. All figures indicate a slope-1 boundary between the generalization and memorization phases, as predicted by the interacting repon theory. Green, yellow, and purple regions indicate the generalization, delayed generalization (grokking), and memorization phase respectively.
...and 3 more figures

GenEFT: Understanding Statics and Dynamics of Model Generalization via Effective Theory

TL;DR

Abstract

GenEFT: Understanding Statics and Dynamics of Model Generalization via Effective Theory

Authors

TL;DR

Abstract

Table of Contents

Figures (8)