Categorical Invariants of Learning Dynamics
Abdulrahman Tamim
TL;DR
The paper reframes neural network learning as a structure-preserving functor $\mathcal{L}: \mathbf{Param} \to \mathbf{Rep}$, linking parameter-space trajectories to representation evolution and revealing invariants such as homotopy classes that constrain generalization. It integrates homotopy theory, persistence, and universal properties (limits and colimits) into a 2-categorical framework $\mathbf{Learn}$, and introduces practical algorithms for detecting homotopy classes, computing persistence diagrams, and performing pullback-based transfer. Empirically, it shows MNIST networks cluster into a small number of homotopy classes with stable generalization within classes, and CIFAR-10 persistence correlates strongly with generalization ($R^2=0.82$). Transfer learning is recast as a pullback construction, enabling efficient knowledge transfer and explaining cross-domain effectiveness; natural-gradient and Fisher-enriched geometry provide a principled view of optimization paths. The work highlights practical tools and open questions for scaling the categorical framework, and suggests deep connections to related fields, offering a principled language for why and how learning works beyond parameter values alone.
Abstract
Neural network training is typically viewed as gradient descent on a loss surface. We propose a fundamentally different perspective: learning is a structure-preserving transformation (a functor L) between the space of network parameters (Param) and the space of learned representations (Rep). This categorical framework reveals that different training runs producing similar test performance often belong to the same homotopy class (continuous deformation family) of optimization paths. We show experimentally that networks converging via homotopic trajectories generalize within 0.5% accuracy of each other, while non-homotopic paths differ by over 3%. The theory provides practical tools: persistent homology identifies stable minima predictive of generalization (R^2 = 0.82 correlation), pullback constructions formalize transfer learning, and 2-categorical structures explain when different optimization algorithms yield functionally equivalent models. These categorical invariants offer both theoretical insight into why deep learning works and concrete algorithmic principles for training more robust networks.
