Table of Contents
Fetching ...

PRKAN: Parameter-Reduced Kolmogorov-Arnold Networks

Hoang-Thang Ta, Duy-Quy Thai, Anh Tran, Grigori Sidorov, Alexander Gelbukh

TL;DR

The paper tackles the high parameter cost of Kolmogorov-Arnold Networks (KANs) and introduces PRKAN, a Parameter-Reduced KAN, to align parameter counts with MLPs. PRKAN integrates attention, convolutional components, dimension summation, and feature-weight vectors, along with data normalization, to compress KAN layers without altering the overall network structure. On MNIST and Fashion-MNIST, PRKAN variants—especially with attention and layer normalization—achieve competitive validation accuracy relative to MLPs while maintaining similar parameter budgets, with GRBFs often providing faster, more accurate results than B-splines. The work demonstrates that KANs can be made efficient and competitive, offering a pathway to lightweight KANs for image tasks and beyond.

Abstract

Kolmogorov-Arnold Networks (KANs) represent an innovation in neural network architectures, offering a compelling alternative to Multi-Layer Perceptrons (MLPs) in models such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Transformers. By advancing network design, KANs drive groundbreaking research and enable transformative applications across various scientific domains involving neural networks. However, existing KANs often require significantly more parameters in their network layers than MLPs. To address this limitation, this paper introduces PRKANs (Parameter-Reduced Kolmogorov-Arnold Networks), which employ several methods to reduce the parameter count in KAN layers, making them comparable to MLP layers. Experimental results on the MNIST and Fashion-MNIST datasets demonstrate that PRKANs outperform several existing KANs, and their variant with attention mechanisms rivals the performance of MLPs, albeit with slightly longer training times. Furthermore, the study highlights the advantages of Gaussian Radial Basis Functions (GRBFs) and layer normalization in KAN designs. The repository for this work is available at: https://github.com/hoangthangta/All-KAN.

PRKAN: Parameter-Reduced Kolmogorov-Arnold Networks

TL;DR

The paper tackles the high parameter cost of Kolmogorov-Arnold Networks (KANs) and introduces PRKAN, a Parameter-Reduced KAN, to align parameter counts with MLPs. PRKAN integrates attention, convolutional components, dimension summation, and feature-weight vectors, along with data normalization, to compress KAN layers without altering the overall network structure. On MNIST and Fashion-MNIST, PRKAN variants—especially with attention and layer normalization—achieve competitive validation accuracy relative to MLPs while maintaining similar parameter budgets, with GRBFs often providing faster, more accurate results than B-splines. The work demonstrates that KANs can be made efficient and competitive, offering a pathway to lightweight KANs for image tasks and beyond.

Abstract

Kolmogorov-Arnold Networks (KANs) represent an innovation in neural network architectures, offering a compelling alternative to Multi-Layer Perceptrons (MLPs) in models such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Transformers. By advancing network design, KANs drive groundbreaking research and enable transformative applications across various scientific domains involving neural networks. However, existing KANs often require significantly more parameters in their network layers than MLPs. To address this limitation, this paper introduces PRKANs (Parameter-Reduced Kolmogorov-Arnold Networks), which employ several methods to reduce the parameter count in KAN layers, making them comparable to MLP layers. Experimental results on the MNIST and Fashion-MNIST datasets demonstrate that PRKANs outperform several existing KANs, and their variant with attention mechanisms rivals the performance of MLPs, albeit with slightly longer training times. Furthermore, the study highlights the advantages of Gaussian Radial Basis Functions (GRBFs) and layer normalization in KAN designs. The repository for this work is available at: https://github.com/hoangthangta/All-KAN.
Paper Structure (28 sections, 40 equations, 7 figures, 9 tables)

This paper contains 28 sections, 40 equations, 7 figures, 9 tables.

Figures (7)

  • Figure 1: Left: The structure of KAN(2,3,1). Right: The simulation of how to calculate $\phi_{1,1,1}$ by control points and B-splines ta2024fc. $G$ and $k$ is the grid size and the spline order, the number of B-splines equals $G + k = 3 + 3 = 6$.
  • Figure 2: Plots of outputs generated by Sigmoid and B-spline functions. The more B-splines are used, the smoother the curves become.
  • Figure 3: The diagram illustrates how the input $(B, D)$ is passed through both MLP and PRKAN layers to produce the output $(B, d_{\text{out}})$. Convolutional layers can be used independently or in combination with pooling layers to reduce data dimensionality.
  • Figure 4: The architectures are defined by methods in PRKANs (attn, conv, conv&pool, dim-sum, and fwv) and MLPs (base), along with suggestions on where to apply data normalization. Two key positions (1 & 2) are proposed for applying data normalization on tensor data with the shape $(B, D)$.
  • Figure 5: Models by FLOPs and validation accuracy values. PRKAN-conv&pool models are excluded due to their massive use of FLOPs. PRKAN-attn and PRKAN-dim-sum models without data normalization are also excluded due to their poor validation accuracy.
  • ...and 2 more figures