Table of Contents
Fetching ...

Spontaneous Kolmogorov-Arnold Geometry in Shallow MLPs

Michael H. Freedman, Michael Mulligan

TL;DR

This work demonstrates that KA geometry, defined by a distinctive local Jacobian texture in the first-layer map, can spontaneously emerge during training of shallow MLPs. By analyzing the exterior-power minors of the Jacobian, the authors identify zero rows and minor concentration as robust signatures, and show that such KA-like structure appears in a Goldilocks regime (e.g., xor-type targets) and correlates with learning progress. They introduce metrics (zero rows, participation ratios, random rotation ratios, column divergences) and dynamics to chart a KA phase diagram, with implications for accelerating learning via targeted interventions. The findings provide a foundation for understanding how nonlinear first-layer geometry influences representations prepared for downstream processing and generalization, and for extending KA insights to larger architectures.

Abstract

The Kolmogorov-Arnold (KA) representation theorem constructs universal, but highly non-smooth inner functions (the first layer map) in a single (non-linear) hidden layer neural network. Such universal functions have a distinctive local geometry, a "texture," which can be characterized by the inner function's Jacobian $J({\mathbf{x}})$, as $\mathbf{x}$ varies over the data. It is natural to ask if this distinctive KA geometry emerges through conventional neural network optimization. We find that indeed KA geometry often is produced when training vanilla single hidden layer neural networks. We quantify KA geometry through the statistical properties of the exterior powers of $J(\mathbf{x})$: number of zero rows and various observables for the minor statistics of $J(\mathbf{x})$, which measure the scale and axis alignment of $J(\mathbf{x})$. This leads to a rough understanding for where KA geometry occurs in the space of function complexity and model hyperparameters. The motivation is first to understand how neural networks organically learn to prepare input data for later downstream processing and, second, to learn enough about the emergence of KA geometry to accelerate learning through a timely intervention in network hyperparameters. This research is the "flip side" of KA-Networks (KANs). We do not engineer KA into the neural network, but rather watch KA emerge in shallow MLPs.

Spontaneous Kolmogorov-Arnold Geometry in Shallow MLPs

TL;DR

This work demonstrates that KA geometry, defined by a distinctive local Jacobian texture in the first-layer map, can spontaneously emerge during training of shallow MLPs. By analyzing the exterior-power minors of the Jacobian, the authors identify zero rows and minor concentration as robust signatures, and show that such KA-like structure appears in a Goldilocks regime (e.g., xor-type targets) and correlates with learning progress. They introduce metrics (zero rows, participation ratios, random rotation ratios, column divergences) and dynamics to chart a KA phase diagram, with implications for accelerating learning via targeted interventions. The findings provide a foundation for understanding how nonlinear first-layer geometry influences representations prepared for downstream processing and generalization, and for extending KA insights to larger architectures.

Abstract

The Kolmogorov-Arnold (KA) representation theorem constructs universal, but highly non-smooth inner functions (the first layer map) in a single (non-linear) hidden layer neural network. Such universal functions have a distinctive local geometry, a "texture," which can be characterized by the inner function's Jacobian , as varies over the data. It is natural to ask if this distinctive KA geometry emerges through conventional neural network optimization. We find that indeed KA geometry often is produced when training vanilla single hidden layer neural networks. We quantify KA geometry through the statistical properties of the exterior powers of : number of zero rows and various observables for the minor statistics of , which measure the scale and axis alignment of . This leads to a rough understanding for where KA geometry occurs in the space of function complexity and model hyperparameters. The motivation is first to understand how neural networks organically learn to prepare input data for later downstream processing and, second, to learn enough about the emergence of KA geometry to accelerate learning through a timely intervention in network hyperparameters. This research is the "flip side" of KA-Networks (KANs). We do not engineer KA into the neural network, but rather watch KA emerge in shallow MLPs.

Paper Structure

This paper contains 17 sections, 20 equations, 18 figures, 4 tables.

Figures (18)

  • Figure 1: Weight and Jacobian heatmaps of a model before and after training, plotted on a linear scale. The model has $16$ hidden neurons and was trained on the xor function \ref{['xorfunction']}. The right panel is the Jacobian of a single, randomly chosen example. There is one row of relatively small values in the trained weight matrix $A^T$: row 8. Consistent with \ref{['jacobianformula']} this row produces a correspondingly small row in the trained Jacobian. There are many relatively small/large rows of the trained Jacobian that are not directly correlated with the weight matrix $A^T$, but must instead result from the derivative factor in \ref{['jacobianformula']}, suggestive of KA geometry.
  • Figure 2: Row means of size-$k$ minor matrices of the trained and initial xor$(32)$ model, ranked in ascending order. Horizontal lines mark row mean quantiles of the model at initalization. Size-$3$ row means can approach machine precision (not shown with the $10^{-16}$ lower cutoff).
  • Figure 3: Size-$3$ row mean comparison across target functions.
  • Figure 4: Size-$3$ minor distribution comparison across target functions.
  • Figure 5: Normalized participation ratios (mean $\pm$ standard error) across hidden dimensions.
  • ...and 13 more figures