Activation Space Selectable Kolmogorov-Arnold Networks

Zhuoqin Yang; Jiansong Zhang; Xiaoling Luo; Zheng Lu; Linlin Shen

Activation Space Selectable Kolmogorov-Arnold Networks

Zhuoqin Yang, Jiansong Zhang, Xiaoling Luo, Zheng Lu, Linlin Shen

TL;DR

This work addresses performance variability in Kolmogorov-Arnold Networks (KAN) by introducing Selectable KAN (S-KAN), which adaptively chooses from a pool of univariate activation functions at each data node. The authors formalize an activation-space framework, implement a three-stage training pipeline with pruning, and extend the idea to Selectable Convolutional KAN (S-ConvKAN) for vision tasks. Empirical results show that S-KAN outperforms fixed-activation MLPs in seven function-fitting benchmarks and that S-ConvKAN achieves leading accuracy on several image datasets, often with competitive parameter counts. The study highlights a data-centric design principle for KAN-based architectures and suggests broad potential for activation-space selectivity in future large-scale AI systems.

Abstract

The multilayer perceptron (MLP), a fundamental paradigm in current artificial intelligence, is widely applied in fields such as computer vision and natural language processing. However, the recently proposed Kolmogorov-Arnold Network (KAN), based on nonlinear additive connections, has been proven to achieve performance comparable to MLPs with significantly fewer parameters. Despite this potential, the use of a single activation function space results in reduced performance of KAN and related works across different tasks. To address this issue, we propose an activation space Selectable KAN (S-KAN). S-KAN employs an adaptive strategy to choose the possible activation mode for data at each feedforward KAN node. Our approach outperforms baseline methods in seven representative function fitting tasks and significantly surpasses MLP methods with the same level of parameters. Furthermore, we extend the structure of S-KAN and propose an activation space selectable Convolutional KAN (S-ConvKAN), which achieves leading results on four general image classification datasets. Our method mitigates the performance variability of the original KAN across different tasks and demonstrates through extensive experiments that feedforward KANs with selectable activations can achieve or even exceed the performance of MLP-based methods. This work contributes to the understanding of the data-centric design of new AI paradigms and provides a foundational reference for innovations in KAN-based network architectures.

Activation Space Selectable Kolmogorov-Arnold Networks

TL;DR

Abstract

Paper Structure (26 sections, 11 equations, 6 figures, 8 tables, 1 algorithm)

This paper contains 26 sections, 11 equations, 6 figures, 8 tables, 1 algorithm.

Introduction
Related Work
MLP-based Paradigms
Exploration on KANs
Method
Primarily: Kolmogorov-Arnold Networks
Non-linear Learnable
Activation Space in KANs
Selectable Kolmogorov-Arnold Networks(S-KAN)
Selectable Activation Space
Efficient S-KAN Training
Selectable Convolutional KAN (S-ConvKAN)
Experiments
Implementation Details
Activation Function Pool
...and 11 more sections

Figures (6)

Figure 1: KAN with fixed activation function layers (left) and the activation spaces Selectable KAN (S-KAN) proposed in this paper (right). Compared to the restricted activation function usage in the original KAN, S-KAN performs adaptive selection of activation functions for each data node, ensuring that the feedforward activation functions are the best choices under the selected strategy.
Figure 2: In univariate function fitting (black), different fitting strategies (other colors) exhibit significant differences in performance at various values.
Figure 3: The three-step training strategy in S-KAN. Step 1: Full Training. Each candidate nonlinear mapping method in the function pool initializes feedforward weights, which, together with the activation function scale factors, receive gradients from the cost function and are updated. Step 2: Selective Training. Freeze the activation function scale factors and train the selective weights to optimize the cost function. Step 3: Pruning. Prune the unimportant activation functions based on their weights. After pruning, the remaining weights are redistributed to ensure that they sum to 1.
Figure 4: Fitting of four binary functions. Visualization shows that the S-KAN can achieve high-performance fitting across different binary functions (exponential sum of squares, product, division), indicating its universality.
Figure 5: Based on cross entropy, t-SNE visualization results for two image datasets with 10 classes each (CIFAR-10 on top and Fashion MNIST on bottom). "Image" represents the embedding of the original image. "KAN" and "S-KAN" represent the visualization results of the penultimate layer's embeddings of the models. The embedding results based on S-KAN show significantly better feature clustering capability compared to the original KAN.
...and 1 more figures

Activation Space Selectable Kolmogorov-Arnold Networks

TL;DR

Abstract

Activation Space Selectable Kolmogorov-Arnold Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (6)