AF-KAN: Activation Function-Based Kolmogorov-Arnold Networks for Efficient Representation Learning
Hoang-Thang Ta, Anh Tran
TL;DR
AF-KAN introduces Activation Function-Based Kolmogorov-Arnold Networks to address ReLU-KAN limitations by enabling diverse activation mixtures and applying attention-based parameter reduction with data normalization. Built on the Kolmogorov-Arnold Representation Theorem, AF-KAN generalizes edge-based activations to handle multiple inputs, using function sets $\mathbf{A}$ formed from shifted activations and various function types up to degree three. Through global/spatial attention and multi-step linear transformations, AF-KAN achieves competitive or superior accuracy to MLPs and other KANs at similar parameter counts, notably on MNIST and Fashion-MNIST, at the expense of longer training times and higher FLOPs. Ablation studies identify SiLU as a strong default activation, quad1 as a robust function type, and small grid sizes with third-order splines as effective settings, with two normalization schemes (L2MM and PLN) being crucial for performance. These results suggest AF-KAN as a promising, parameter-efficient alternative for image classification, while highlighting the need for further optimization for scalability and speed.
Abstract
Kolmogorov-Arnold Networks (KANs) have inspired numerous works exploring their applications across a wide range of scientific problems, with the potential to replace Multilayer Perceptrons (MLPs). While many KANs are designed using basis and polynomial functions, such as B-splines, ReLU-KAN utilizes a combination of ReLU functions to mimic the structure of B-splines and take advantage of ReLU's speed. However, ReLU-KAN is not built for multiple inputs, and its limitations stem from ReLU's handling of negative values, which can restrict feature extraction. To address these issues, we introduce Activation Function-Based Kolmogorov-Arnold Networks (AF-KAN), expanding ReLU-KAN with various activations and their function combinations. This novel KAN also incorporates parameter reduction methods, primarily attention mechanisms and data normalization, to enhance performance on image classification datasets. We explore different activation functions, function combinations, grid sizes, and spline orders to validate the effectiveness of AF-KAN and determine its optimal configuration. In the experiments, AF-KAN significantly outperforms MLP, ReLU-KAN, and other KANs with the same parameter count. It also remains competitive even when using fewer than 6 to 10 times the parameters while maintaining the same network structure. However, AF-KAN requires a longer training time and consumes more FLOPs. The repository for this work is available at https://github.com/hoangthangta/All-KAN.
