Grokking Finite-Dimensional Algebra

Pascal Jr Tikeng Notsawo; Guillaume Dumas; Guillaume Rabusseau

Grokking Finite-Dimensional Algebra

Pascal Jr Tikeng Notsawo, Guillaume Dumas, Guillaume Rabusseau

TL;DR

This work provides a unified framework for grokking across algebraic structures and new insights into how mathematical structure governs neural network generalization dynamics.

Abstract

This paper investigates the grokking phenomenon, which refers to the sudden transition from a long memorization to generalization observed during neural networks training, in the context of learning multiplication in finite-dimensional algebras (FDA). While prior work on grokking has focused mainly on group operations, we extend the analysis to more general algebraic structures, including non-associative, non-commutative, and non-unital algebras. We show that learning group operations is a special case of learning FDA, and that learning multiplication in FDA amounts to learning a bilinear product specified by the algebra's structure tensor. For algebras over the reals, we connect the learning problem to matrix factorization with an implicit low-rank bias, and for algebras over finite fields, we show that grokking emerges naturally as models must learn discrete representations of algebraic elements. This leads us to experimentally investigate the following core questions: (i) how do algebraic properties such as commutativity, associativity, and unitality influence both the emergence and timing of grokking, (ii) how structural properties of the structure tensor of the FDA, such as sparsity and rank, influence generalization, and (iii) to what extent generalization correlates with the model learning latent embeddings aligned with the algebra's representation. Our work provides a unified framework for grokking across algebraic structures and new insights into how mathematical structure governs neural network generalization dynamics.

Grokking Finite-Dimensional Algebra

TL;DR

This work provides a unified framework for grokking across algebraic structures and new insights into how mathematical structure governs neural network generalization dynamics.

Abstract

Paper Structure (41 sections, 20 theorems, 61 equations, 13 figures)

This paper contains 41 sections, 20 theorems, 61 equations, 13 figures.

Introduction
Motivations
Contributions
Related Work
Structure of the document
Notations
Finite Dimensional Algebra
Definitions
Structure Constants of FDA
Representations of FDA
From Groups to FDA
Learning Finite Dimensional Algebra
Grokking Regimes in FDAs
A Linear Inverse View for F=R
Finite Fields F=Z/pZ
...and 26 more sections

Key Result

Proposition 2.1

For all $\mathbf{\boldsymbol{\mathcal{C}}} \in \mathbb{F}^{n \times n \times n}$, there exists an $n\mathbb{F}$-FDA whose structure tensor is $\mathbf{\boldsymbol{\mathcal{C}}}$ in some basis. Moreover, all algebras that have $\mathbf{\boldsymbol{\mathcal{C}}}$ as a structure tensor in one of their

Figures (13)

Figure 1: Representation quality and generalization performances as a function of training steps ($r=0.5$). Before grokking $\mathcal{A}_{\text{rep}}$ remains relatively low and then sharply transitions to $\approx 1$ as the model groks.
Figure 2: Generalization accuracy as a function of representation quality for different training data size ($r$) and model layers ($0$ for first layer, $1$ for the second, etc.): $\mathcal{A}_{\text{test}}$ increases with $\mathcal{A}_{\text{rep}}$ in a low-data regime.
Figure 3: Histogram of grokking delay under single-entry perturbations the structure tensor $\mathbf{\boldsymbol{\mathcal{C}}}^*$ of complex numbers in $\mathbb{Z}/7\mathbb{Z}$.
Figure 4: Evolution of the test accuracy $\mathcal{A}_{\text{test}}$ during training for different training data fraction $r \in \{0.2, 0.3\}$.
Figure 5: Test loss $\mathcal{L}_{\text{test}}$ and grokking step $t_4$ as a function of training data fraction $r$.
...and 8 more figures

Theorems & Definitions (61)

Example 2.1
Proposition 2.1
Proposition 2.2
proof
Proposition 2.3
proof
Example 2.2
Proposition 3.1
proof
Proposition 5.1
...and 51 more

Grokking Finite-Dimensional Algebra

TL;DR

Abstract

Grokking Finite-Dimensional Algebra

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (61)