Rethinking the Function of Neurons in KANs
Mohammed Ghaith Altarabichi
TL;DR
This work questions the default use of a sum as the neuron-level operation in Kolmogorov-Arnold Networks (KANs) and investigates whether a different multivariate function can improve practical utility in high-dimensional settings. The authors systematically compare nine multivariate neuron functions on ten classification datasets using a two-layer KAN, and identify mean as yielding the strongest and most consistent performance, relating its benefits to keeping activations within the effective range of spline activations and to the Kolmogorov-Arnold representation theorem, which expresses any continuous multivariate function $f(x_1, ldots,x_n)$ as $f = \sum_{q=1}^{2n+1} \Phi_q \left( \sum_{p=1}^n \phi_{q,p}(x_p) \right)$. They demonstrate that mean-based KANs improve both accuracy and training stability relative to a standard sum-based KAN and even outperform KAN variants with Layer Normalization on several datasets, while maintaining robust performance. The study provides a practical design guidance for KANs, showing that a simple averaging operation can align with the theoretical underpinnings and yield tangible gains for tabular data and potential extensions to other KAN architectures.
Abstract
The neurons of Kolmogorov-Arnold Networks (KANs) perform a simple summation motivated by the Kolmogorov-Arnold representation theorem, which asserts that sum is the only fundamental multivariate function. In this work, we investigate the potential for identifying an alternative multivariate function for KAN neurons that may offer increased practical utility. Our empirical research involves testing various multivariate functions in KAN neurons across a range of benchmark Machine Learning tasks. Our findings indicate that substituting the sum with the average function in KAN neurons results in significant performance enhancements compared to traditional KANs. Our study demonstrates that this minor modification contributes to the stability of training by confining the input to the spline within the effective range of the activation function. Our implementation and experiments are available at: \url{https://github.com/Ghaith81/dropkan}
