Table of Contents
Fetching ...

Categorical Invariants of Learning Dynamics

Abdulrahman Tamim

TL;DR

The paper reframes neural network learning as a structure-preserving functor $\mathcal{L}: \mathbf{Param} \to \mathbf{Rep}$, linking parameter-space trajectories to representation evolution and revealing invariants such as homotopy classes that constrain generalization. It integrates homotopy theory, persistence, and universal properties (limits and colimits) into a 2-categorical framework $\mathbf{Learn}$, and introduces practical algorithms for detecting homotopy classes, computing persistence diagrams, and performing pullback-based transfer. Empirically, it shows MNIST networks cluster into a small number of homotopy classes with stable generalization within classes, and CIFAR-10 persistence correlates strongly with generalization ($R^2=0.82$). Transfer learning is recast as a pullback construction, enabling efficient knowledge transfer and explaining cross-domain effectiveness; natural-gradient and Fisher-enriched geometry provide a principled view of optimization paths. The work highlights practical tools and open questions for scaling the categorical framework, and suggests deep connections to related fields, offering a principled language for why and how learning works beyond parameter values alone.

Abstract

Neural network training is typically viewed as gradient descent on a loss surface. We propose a fundamentally different perspective: learning is a structure-preserving transformation (a functor L) between the space of network parameters (Param) and the space of learned representations (Rep). This categorical framework reveals that different training runs producing similar test performance often belong to the same homotopy class (continuous deformation family) of optimization paths. We show experimentally that networks converging via homotopic trajectories generalize within 0.5% accuracy of each other, while non-homotopic paths differ by over 3%. The theory provides practical tools: persistent homology identifies stable minima predictive of generalization (R^2 = 0.82 correlation), pullback constructions formalize transfer learning, and 2-categorical structures explain when different optimization algorithms yield functionally equivalent models. These categorical invariants offer both theoretical insight into why deep learning works and concrete algorithmic principles for training more robust networks.

Categorical Invariants of Learning Dynamics

TL;DR

The paper reframes neural network learning as a structure-preserving functor , linking parameter-space trajectories to representation evolution and revealing invariants such as homotopy classes that constrain generalization. It integrates homotopy theory, persistence, and universal properties (limits and colimits) into a 2-categorical framework , and introduces practical algorithms for detecting homotopy classes, computing persistence diagrams, and performing pullback-based transfer. Empirically, it shows MNIST networks cluster into a small number of homotopy classes with stable generalization within classes, and CIFAR-10 persistence correlates strongly with generalization (). Transfer learning is recast as a pullback construction, enabling efficient knowledge transfer and explaining cross-domain effectiveness; natural-gradient and Fisher-enriched geometry provide a principled view of optimization paths. The work highlights practical tools and open questions for scaling the categorical framework, and suggests deep connections to related fields, offering a principled language for why and how learning works beyond parameter values alone.

Abstract

Neural network training is typically viewed as gradient descent on a loss surface. We propose a fundamentally different perspective: learning is a structure-preserving transformation (a functor L) between the space of network parameters (Param) and the space of learned representations (Rep). This categorical framework reveals that different training runs producing similar test performance often belong to the same homotopy class (continuous deformation family) of optimization paths. We show experimentally that networks converging via homotopic trajectories generalize within 0.5% accuracy of each other, while non-homotopic paths differ by over 3%. The theory provides practical tools: persistent homology identifies stable minima predictive of generalization (R^2 = 0.82 correlation), pullback constructions formalize transfer learning, and 2-categorical structures explain when different optimization algorithms yield functionally equivalent models. These categorical invariants offer both theoretical insight into why deep learning works and concrete algorithmic principles for training more robust networks.

Paper Structure

This paper contains 43 sections, 4 theorems, 15 equations, 5 figures.

Key Result

Theorem 2.6

There exists a functor $\mathcal{L}: \mathbf{Param} \to \mathbf{Rep}$ defined by: This satisfies:

Figures (5)

  • Figure 1: The learning functor $\mathcal{L}$ maps parameter trajectories (blue) to representation paths (red), preserving composition. Training sequentially ($\gamma_1$ then $\gamma_2$) induces the same representation change as composing individual changes.
  • Figure 2: Two optimization paths $\gamma_0$ (fast, upper) and $\gamma_1$ (slow, lower) connecting $\theta_A$ to $\theta_B$. If the region between them contains no high-loss barriers, they are homotopic. The gray dashes illustrate intermediate paths in the homotopy $H(s, t)$.
  • Figure 3: A 2-morphism $H: \gamma_0 \Rightarrow \gamma_1$ in $\mathbf{Learn}$. The double arrow indicates a transformation between 1-morphisms (paths). This encodes the statement that training procedures $\gamma_0$ and $\gamma_1$ are homotopically equivalent.
  • Figure 4: Persistence diagram for a loss landscape. Each point $(b_i, d_i)$ represents a local minimum. Distance from the diagonal measures persistence: far points (blue) correspond to flat, stable minima; near points (red) are sharp, unstable minima.
  • Figure 5: Generalization gap vs. total persistence for 50 ResNet-18 networks on CIFAR-10. Strong negative correlation: higher persistence (flatter minima) predicts lower generalization gap.

Theorems & Definitions (36)

  • Definition 2.1: Parameter Category $\mathbf{Param}$
  • Example 2.2: MNIST Training Trajectories
  • Remark 2.3
  • Definition 2.4: Representation Category $\mathbf{Rep}$
  • Example 2.5: ResNet-18 Representation Evolution on CIFAR-10
  • Theorem 2.6: Learning Functor
  • proof : Proof sketch
  • Example 2.7: Computing $\mathcal{L}$ for MNIST
  • Definition 3.1: Homotopy of Optimization Paths
  • Example 3.2: Homotopic Training Runs on CIFAR-10
  • ...and 26 more