Spontaneous Kolmogorov-Arnold Geometry in Shallow MLPs
Michael H. Freedman, Michael Mulligan
TL;DR
This work demonstrates that KA geometry, defined by a distinctive local Jacobian texture in the first-layer map, can spontaneously emerge during training of shallow MLPs. By analyzing the exterior-power minors of the Jacobian, the authors identify zero rows and minor concentration as robust signatures, and show that such KA-like structure appears in a Goldilocks regime (e.g., xor-type targets) and correlates with learning progress. They introduce metrics (zero rows, participation ratios, random rotation ratios, column divergences) and dynamics to chart a KA phase diagram, with implications for accelerating learning via targeted interventions. The findings provide a foundation for understanding how nonlinear first-layer geometry influences representations prepared for downstream processing and generalization, and for extending KA insights to larger architectures.
Abstract
The Kolmogorov-Arnold (KA) representation theorem constructs universal, but highly non-smooth inner functions (the first layer map) in a single (non-linear) hidden layer neural network. Such universal functions have a distinctive local geometry, a "texture," which can be characterized by the inner function's Jacobian $J({\mathbf{x}})$, as $\mathbf{x}$ varies over the data. It is natural to ask if this distinctive KA geometry emerges through conventional neural network optimization. We find that indeed KA geometry often is produced when training vanilla single hidden layer neural networks. We quantify KA geometry through the statistical properties of the exterior powers of $J(\mathbf{x})$: number of zero rows and various observables for the minor statistics of $J(\mathbf{x})$, which measure the scale and axis alignment of $J(\mathbf{x})$. This leads to a rough understanding for where KA geometry occurs in the space of function complexity and model hyperparameters. The motivation is first to understand how neural networks organically learn to prepare input data for later downstream processing and, second, to learn enough about the emergence of KA geometry to accelerate learning through a timely intervention in network hyperparameters. This research is the "flip side" of KA-Networks (KANs). We do not engineer KA into the neural network, but rather watch KA emerge in shallow MLPs.
