Reduced Effectiveness of Kolmogorov-Arnold Networks on Functions with Noise

Haoran Shen; Chen Zeng; Jiahui Wang; Qiao Wang

Reduced Effectiveness of Kolmogorov-Arnold Networks on Functions with Noise

Haoran Shen, Chen Zeng, Jiahui Wang, Qiao Wang

TL;DR

This work addresses the vulnerability of Kolmogorov-Arnold networks (KAN) to data noise and evaluates two mitigation strategies: kernel filtering with a Gaussian-like kernel guided by diffusion-map ideas and oversampling with denoising based on frame theory. It shows that expanding the training data can reduce test-loss approximately as $\text{test-loss} \sim \mathcal{O}(r^{-\frac{1}{2}})$, while kernel filtering provides benefits mainly in low-SNR regimes but requires careful selection of $\sigma$ and is not universally beneficial. The study further finds that combining oversampling with kernel filtering does not yield consistent improvements, as filtering can hinder the benefits of oversampling and data costs rise substantially. Overall, while the proposed approaches mitigate some effects of noise, overcoming noise remains a significant challenge for KANs in practical applications.

Abstract

It has been observed that even a small amount of noise introduced into the dataset can significantly degrade the performance of KAN. In this brief note, we aim to quantitatively evaluate the performance when noise is added to the dataset. We propose an oversampling technique combined with denoising to alleviate the impact of noise. Specifically, we employ kernel filtering based on diffusion maps for pre-filtering the noisy data for training KAN network. Our experiments show that while adding i.i.d. noise with any fixed SNR, when we increase the amount of training data by a factor of $r$, the test-loss (RMSE) of KANs will exhibit a performance trend like $\text{test-loss} \sim \mathcal{O}(r^{-\frac{1}{2}})$ as $r\to +\infty$. We conclude that applying both oversampling and filtering strategies can reduce the detrimental effects of noise. Nevertheless, determining the optimal variance for the kernel filtering process is challenging, and enhancing the volume of training data substantially increases the associated costs, because the training dataset needs to be expanded multiple times in comparison to the initial clean data. As a result, the noise present in the data ultimately diminishes the effectiveness of Kolmogorov-Arnold networks.

Reduced Effectiveness of Kolmogorov-Arnold Networks on Functions with Noise

TL;DR

, while kernel filtering provides benefits mainly in low-SNR regimes but requires careful selection of

and is not universally beneficial. The study further finds that combining oversampling with kernel filtering does not yield consistent improvements, as filtering can hinder the benefits of oversampling and data costs rise substantially. Overall, while the proposed approaches mitigate some effects of noise, overcoming noise remains a significant challenge for KANs in practical applications.

Abstract

, the test-loss (RMSE) of KANs will exhibit a performance trend like

. We conclude that applying both oversampling and filtering strategies can reduce the detrimental effects of noise. Nevertheless, determining the optimal variance for the kernel filtering process is challenging, and enhancing the volume of training data substantially increases the associated costs, because the training dataset needs to be expanded multiple times in comparison to the initial clean data. As a result, the noise present in the data ultimately diminishes the effectiveness of Kolmogorov-Arnold networks.

Paper Structure (10 sections, 8 equations, 7 figures, 3 tables)

This paper contains 10 sections, 8 equations, 7 figures, 3 tables.

Introduction
The impact of noise in KANs
Denoise by Kernel filtering
Kernel Filtering
The Optimal Filter Parameter $\sigma$
Enhance training dataset to mitigate noise
Reconstruct Signal from Noisy Oversampled Data
Increase Training Samples
Combining oversampling and kernel filtering
Conclusion

Figures (7)

Figure 1: Applying kernel filtering to $f_2$ and $f_3$ with $\sigma=0.1$.
Figure 2: Applying kernel filtering to $f_2$ with different $\sigma$.
Figure 3: Filtering performance with different values of $\sigma$ under various SNRs.
Figure 4: Applying oversamlping to $f_1$ with different SNRs.
Figure 5: Applying oversamlping to $f_2$ with different SNRs.
...and 2 more figures

Reduced Effectiveness of Kolmogorov-Arnold Networks on Functions with Noise

TL;DR

Abstract

Reduced Effectiveness of Kolmogorov-Arnold Networks on Functions with Noise

Authors

TL;DR

Abstract

Table of Contents

Figures (7)