Free-Knots Kolmogorov-Arnold Network: On the Analysis of Spline Knots and Advancing Stability

Liangwewi Nathan Zheng; Wei Emma Zhang; Lin Yue; Miao Xu; Olaf Maennel; Weitong Chen

Free-Knots Kolmogorov-Arnold Network: On the Analysis of Spline Knots and Advancing Stability

Liangwewi Nathan Zheng, Wei Emma Zhang, Lin Yue, Miao Xu, Olaf Maennel, Weitong Chen

TL;DR

This work tackles fixed-knot limitations, excessive parameter counts, and training instability in Kolmogorov-Arnold Networks (KANs) by deriving knot-count bounds and proposing Free-Knots KAN (FR-KAN). FR-KAN combines neuron grouping with weight sharing, free grid shifts, and a $C^2$-continuity training strategy to reduce parameters to the scale of standard MLPs while enabling more flexible activations. The authors validate FR-KAN across image, text, time-series, multimodal, and function-approximation tasks, showing competitive or superior performance and enhanced stability over vanilla KAN and MLP baselines, with interpretable learned activations and a wider activation field from larger grids. Overall, the approach provides practical guidance for scalable KAN deployment and opens avenues for further efficiency gains in spline-based neural architectures.

Abstract

Kolmogorov-Arnold Neural Networks (KANs) have gained significant attention in the machine learning community. However, their implementation often suffers from poor training stability and heavy trainable parameter. Furthermore, there is limited understanding of the behavior of the learned activation functions derived from B-splines. In this work, we analyze the behavior of KANs through the lens of spline knots and derive the lower and upper bound for the number of knots in B-spline-based KANs. To address existing limitations, we propose a novel Free Knots KAN that enhances the performance of the original KAN while reducing the number of trainable parameters to match the trainable parameter scale of standard Multi-Layer Perceptrons (MLPs). Additionally, we introduce new a training strategy to ensure $C^2$ continuity of the learnable spline, resulting in smoother activation compared to the original KAN and improve the training stability by range expansion. The proposed method is comprehensively evaluated on 8 datasets spanning various domains, including image, text, time series, multimodal, and function approximation tasks. The promising results demonstrates the feasibility of KAN-based network and the effectiveness of proposed method.

Free-Knots Kolmogorov-Arnold Network: On the Analysis of Spline Knots and Advancing Stability

TL;DR

-continuity training strategy to reduce parameters to the scale of standard MLPs while enabling more flexible activations. The authors validate FR-KAN across image, text, time-series, multimodal, and function-approximation tasks, showing competitive or superior performance and enhanced stability over vanilla KAN and MLP baselines, with interpretable learned activations and a wider activation field from larger grids. Overall, the approach provides practical guidance for scalable KAN deployment and opens avenues for further efficiency gains in spline-based neural architectures.

Abstract

continuity of the learnable spline, resulting in smoother activation compared to the original KAN and improve the training stability by range expansion. The proposed method is comprehensively evaluated on 8 datasets spanning various domains, including image, text, time series, multimodal, and function approximation tasks. The promising results demonstrates the feasibility of KAN-based network and the effectiveness of proposed method.

Paper Structure (19 sections, 4 theorems, 17 equations, 5 figures, 2 tables)

This paper contains 19 sections, 4 theorems, 17 equations, 5 figures, 2 tables.

Introduction
Related Works
Kolmogorov-Arnold Network Architecture
Upper Bound of Knots for KAN
Number of Knots for Single Layer KAN
Number of Knots for Multi Layer KAN
Free-Knot KAN
Neuron Grouping and Share Weight
Free Grid
Smooth Oscillation and Training Stability
Experiment
Dataset and Implementation Setting
Classic Deep Learning Task
Function Approximation
Can FR-KAN Find Better Activation than KAN?
...and 4 more sections

Key Result

Lemma 4.1

Any arbitrary neuron $n_i \in \mathcal{F}_r^1$ can generate up to one knot in the current spline. A knot is considered as Unique if the position of knot is different from other. The final knots of spline generated by $\mathcal{F}_r^1$ is up to number of neuron and depends on the number of unique kno

Figures (5)

Figure 1: Preliminary Experiment: Fitting function $\frac{1}{(1+25x^2)}$, (a)(e)(c)(g): Activated feature after input layer and activation summation. (b)(d)(f)(h): Function approximation results.
Figure 2: Performance Comparison: Row 1 is Image Classification dataset and Row 2 is Multimodal (AVMNIST, MIMIC-III), AG NEWS(Text Classification), ETTh1(Time Series Forecasting)
Figure 3: Visualization on Complex Learned Activation and Function Approximation of KAN and FR-KAN
Figure 4: Large Grid Range to Stabilize Training: Accuracy vs Training Step on STL10 Dataset for more than 1200 steps. KAN [-1, 1] and FR-KAN[-1, 1] stop early due to NaN loss
Figure 5: A spline activation example, where $K=1$ and $G=5$. Left: Basis function of B-spline $B_{j,k}$. Right: Spline activation function $\sum_{j=0}^{G} c_j B_j(x_i)$. We highlight the new knots in layer 2 activation with $\bigcirc$.

Theorems & Definitions (4)

Lemma 4.1
Lemma 4.2
Theorem 4.3
Theorem 5.1

Free-Knots Kolmogorov-Arnold Network: On the Analysis of Spline Knots and Advancing Stability

TL;DR

Abstract

Free-Knots Kolmogorov-Arnold Network: On the Analysis of Spline Knots and Advancing Stability

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (4)