On the expressiveness and spectral bias of KANs

Yixuan Wang; Jonathan W. Siegel; Ziming Liu; Thomas Y. Hou

On the expressiveness and spectral bias of KANs

Yixuan Wang, Jonathan W. Siegel, Ziming Liu, Thomas Y. Hou

TL;DR

The paper studies the expressiveness and spectral-learning behavior of Kolmogorov-Arnold Networks (KANs) relative to standard MLPs. It proves that any ReLU^k MLP can be represented by a KAN with k-th order B-splines and comparable parameter counts, while a KAN can be represented by an MLP with width scaling tied to the grid, implying comparable or superior expressiveness for KANs with larger grids. For Sobolev spaces, very deep KANs achieve approximation rates of $O(P^{-2s/d})$, surpassing traditional rates in certain regimes. Experiments show KANs have reduced spectral bias across 1D frequency tasks, Gaussian random fields, and high-frequency PDEs, with grid extension aiding high-frequency learning at the cost of potential overfitting.

Abstract

Kolmogorov-Arnold Networks (KAN) \cite{liu2024kan} were very recently proposed as a potential alternative to the prevalent architectural backbone of many deep learning models, the multi-layer perceptron (MLP). KANs have seen success in various tasks of AI for science, with their empirical efficiency and accuracy demostrated in function regression, PDE solving, and many more scientific problems. In this article, we revisit the comparison of KANs and MLPs, with emphasis on a theoretical perspective. On the one hand, we compare the representation and approximation capabilities of KANs and MLPs. We establish that MLPs can be represented using KANs of a comparable size. This shows that the approximation and representation capabilities of KANs are at least as good as MLPs. Conversely, we show that KANs can be represented using MLPs, but that in this representation the number of parameters increases by a factor of the KAN grid size. This suggests that KANs with a large grid size may be more efficient than MLPs at approximating certain functions. On the other hand, from the perspective of learning and optimization, we study the spectral bias of KANs compared with MLPs. We demonstrate that KANs are less biased toward low frequencies than MLPs. We highlight that the multi-level learning feature specific to KANs, i.e. grid extension of splines, improves the learning process for high-frequency components. Detailed comparisons with different choices of depth, width, and grid sizes of KANs are made, shedding some light on how to choose the hyperparameters in practice.

On the expressiveness and spectral bias of KANs

TL;DR

, surpassing traditional rates in certain regimes. Experiments show KANs have reduced spectral bias across 1D frequency tasks, Gaussian random fields, and high-frequency PDEs, with grid extension aiding high-frequency learning at the cost of potential overfitting.

Abstract

Paper Structure (18 sections, 5 theorems, 41 equations, 7 figures)

This paper contains 18 sections, 5 theorems, 41 equations, 7 figures.

Introduction
Our contribution.
Prior Work
Representation and Approximation
Review of the KAN Architecture
KAN architecture
Grid extension
Approximation theory, KAT
Reparametrization of KANs and MLPs
Spectral Bias
Spectral bias theory for shallow KANs
1D waves of different frequencies
Gaussian random field
PDE example
Concluding Remarks
...and 3 more sections

Key Result

Theorem 3.1

Let $\mathbf{x}=(x_1,x_2,\cdots,x_n)$. Suppose that a function $f(\mathbf{x})$ admits a representation as in eq:KAN_forward, where each one of the $\Phi_{l,i,j}$ is $(k+1)$-times continuously differentiable. Then there exists a constant $C$ depending on $f$ and its representation, such that we have the following approximation bound in terms of the grid size $G$: there exist $k$-th order B-spline

Figures (7)

Figure 1: 1D wave dataset, where the target function has equal amplitudes of different frequency modes. Under various hyperparameters, MLPs manifest strong spectral biases (top), while KANs do not (bottom). Note that the y axis (training steps) of MLP is 10 times that of KAN.
Figure 2: The Gaussian random field dataset. Training losses of MLP and KANs, with different scales and dimensions.
Figure 3: The Gaussian random field dataset. Test losses of MLP and KANs, with different scales and dimensions. Increasing the number of samples by $10$x helps overfitting.
Figure 4: Solving PDEs. $L^2$ and $H^1$ losses of MLP and KAN with different frequencies of the solution.
Figure 5: 2D Poisson. Losses of MLP and KAN with different frequencies of the solution.
...and 2 more figures

Theorems & Definitions (12)

Theorem 3.1
Theorem 3.2
Theorem 3.3
Corollary 3.4
Remark 3.5
Theorem 4.1
Remark 4.2
Remark 4.3
Remark 4.4
proof : Proof of Theorem \ref{['mlp-kan-representation-thm']}
...and 2 more

On the expressiveness and spectral bias of KANs

TL;DR

Abstract

On the expressiveness and spectral bias of KANs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (12)