Table of Contents
Fetching ...

AF-KAN: Activation Function-Based Kolmogorov-Arnold Networks for Efficient Representation Learning

Hoang-Thang Ta, Anh Tran

TL;DR

AF-KAN introduces Activation Function-Based Kolmogorov-Arnold Networks to address ReLU-KAN limitations by enabling diverse activation mixtures and applying attention-based parameter reduction with data normalization. Built on the Kolmogorov-Arnold Representation Theorem, AF-KAN generalizes edge-based activations to handle multiple inputs, using function sets $\mathbf{A}$ formed from shifted activations and various function types up to degree three. Through global/spatial attention and multi-step linear transformations, AF-KAN achieves competitive or superior accuracy to MLPs and other KANs at similar parameter counts, notably on MNIST and Fashion-MNIST, at the expense of longer training times and higher FLOPs. Ablation studies identify SiLU as a strong default activation, quad1 as a robust function type, and small grid sizes with third-order splines as effective settings, with two normalization schemes (L2MM and PLN) being crucial for performance. These results suggest AF-KAN as a promising, parameter-efficient alternative for image classification, while highlighting the need for further optimization for scalability and speed.

Abstract

Kolmogorov-Arnold Networks (KANs) have inspired numerous works exploring their applications across a wide range of scientific problems, with the potential to replace Multilayer Perceptrons (MLPs). While many KANs are designed using basis and polynomial functions, such as B-splines, ReLU-KAN utilizes a combination of ReLU functions to mimic the structure of B-splines and take advantage of ReLU's speed. However, ReLU-KAN is not built for multiple inputs, and its limitations stem from ReLU's handling of negative values, which can restrict feature extraction. To address these issues, we introduce Activation Function-Based Kolmogorov-Arnold Networks (AF-KAN), expanding ReLU-KAN with various activations and their function combinations. This novel KAN also incorporates parameter reduction methods, primarily attention mechanisms and data normalization, to enhance performance on image classification datasets. We explore different activation functions, function combinations, grid sizes, and spline orders to validate the effectiveness of AF-KAN and determine its optimal configuration. In the experiments, AF-KAN significantly outperforms MLP, ReLU-KAN, and other KANs with the same parameter count. It also remains competitive even when using fewer than 6 to 10 times the parameters while maintaining the same network structure. However, AF-KAN requires a longer training time and consumes more FLOPs. The repository for this work is available at https://github.com/hoangthangta/All-KAN.

AF-KAN: Activation Function-Based Kolmogorov-Arnold Networks for Efficient Representation Learning

TL;DR

AF-KAN introduces Activation Function-Based Kolmogorov-Arnold Networks to address ReLU-KAN limitations by enabling diverse activation mixtures and applying attention-based parameter reduction with data normalization. Built on the Kolmogorov-Arnold Representation Theorem, AF-KAN generalizes edge-based activations to handle multiple inputs, using function sets formed from shifted activations and various function types up to degree three. Through global/spatial attention and multi-step linear transformations, AF-KAN achieves competitive or superior accuracy to MLPs and other KANs at similar parameter counts, notably on MNIST and Fashion-MNIST, at the expense of longer training times and higher FLOPs. Ablation studies identify SiLU as a strong default activation, quad1 as a robust function type, and small grid sizes with third-order splines as effective settings, with two normalization schemes (L2MM and PLN) being crucial for performance. These results suggest AF-KAN as a promising, parameter-efficient alternative for image classification, while highlighting the need for further optimization for scalability and speed.

Abstract

Kolmogorov-Arnold Networks (KANs) have inspired numerous works exploring their applications across a wide range of scientific problems, with the potential to replace Multilayer Perceptrons (MLPs). While many KANs are designed using basis and polynomial functions, such as B-splines, ReLU-KAN utilizes a combination of ReLU functions to mimic the structure of B-splines and take advantage of ReLU's speed. However, ReLU-KAN is not built for multiple inputs, and its limitations stem from ReLU's handling of negative values, which can restrict feature extraction. To address these issues, we introduce Activation Function-Based Kolmogorov-Arnold Networks (AF-KAN), expanding ReLU-KAN with various activations and their function combinations. This novel KAN also incorporates parameter reduction methods, primarily attention mechanisms and data normalization, to enhance performance on image classification datasets. We explore different activation functions, function combinations, grid sizes, and spline orders to validate the effectiveness of AF-KAN and determine its optimal configuration. In the experiments, AF-KAN significantly outperforms MLP, ReLU-KAN, and other KANs with the same parameter count. It also remains competitive even when using fewer than 6 to 10 times the parameters while maintaining the same network structure. However, AF-KAN requires a longer training time and consumes more FLOPs. The repository for this work is available at https://github.com/hoangthangta/All-KAN.

Paper Structure

This paper contains 31 sections, 50 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Left: The structure of KAN(2,3,1). Right: Calculate $\phi_{1,1,1}$ by using control points and B-splines ta2024fc. $G$ and $k$ represent the grid size and the spline order, while the number of B-splines, $n$, is given by $G + k = 3 + 3 = 6$.
  • Figure 2: Simulate the plot of $\textbf{R}$ with a grid size $G = 5$ and a spline order $k = 3$.
  • Figure 3: Simulate the plots of the function $\textbf{A}$ using several activation functions. This function is configured with a grid size of $G = 5$, a spline order of $k = 3$, and the function type quad1.
  • Figure 4: Flow of an input through AF-KAN, MLP, and ReLU-KAN layers. Left: AF-KAN enables a broader range of functions using L2 norm, min-max scaling, and attention mechanisms to reduce parameters. Center: MLP applies a single activation function and linear transformation. Right: ReLU-KAN uses the "Square of ReLU" with a constant norm and a 2D convolutional layer.
  • Figure 5: The output labels of MNIST (first row) and Fashion-MNIST (second row).
  • ...and 3 more figures