KAN or MLP: A Fairer Comparison

Runpeng Yu; Weihao Yu; Xinchao Wang

KAN or MLP: A Fairer Comparison

Runpeng Yu, Weihao Yu, Xinchao Wang

TL;DR

This work performs a fair, parameter- and FLOP-controlled comparison between Kolmogorov–Arnold Networks (KAN) and MLPs across multiple domains, finding that MLP generally outperforms KAN except in symbolic formula representation where KAN has an edge. It identifies the learnable B-spline activation as the key factor behind KAN’s distinct performance, showing that equipping MLP with B-spline activations can match or exceed KAN across tasks. Additional ablations reveal the benefits of spline activations are task-dependent, while a standard class-incremental continual learning setup shows KAN forgetting more severely than MLP, challenging prior conclusions. The results offer practical guidance for future work on KAN and MAP-style MLP alternatives, highlighting when spline activations are beneficial and where they are not.

Abstract

This paper does not introduce a novel method. Instead, it offers a fairer and more comprehensive comparison of KAN and MLP models across various tasks, including machine learning, computer vision, audio processing, natural language processing, and symbolic formula representation. Specifically, we control the number of parameters and FLOPs to compare the performance of KAN and MLP. Our main observation is that, except for symbolic formula representation tasks, MLP generally outperforms KAN. We also conduct ablation studies on KAN and find that its advantage in symbolic formula representation mainly stems from its B-spline activation function. When B-spline is applied to MLP, performance in symbolic formula representation significantly improves, surpassing or matching that of KAN. However, in other tasks where MLP already excels over KAN, B-spline does not substantially enhance MLP's performance. Furthermore, we find that KAN's forgetting issue is more severe than that of MLP in a standard class-incremental continual learning setting, which differs from the findings reported in the KAN paper. We hope these results provide insights for future research on KAN and other MLP alternatives. Project link: https://github.com/yu-rp/KANbeFair

KAN or MLP: A Fairer Comparison

TL;DR

Abstract

Paper Structure (10 sections, 8 equations, 15 figures, 1 table)

This paper contains 10 sections, 8 equations, 15 figures, 1 table.

Introduction
Formulation of KAN and MLP
Number of Parameters of KAN and MLP
FLOPs of KAN and MLP
Experiments
Performance Comparison
Architecture Ablation
Continual Learning
Related Works
Conclusion

Figures (15)

Figure 1: Performance comparison between KAN and MLP under fair setup. MLP yields higher average accuracy in machine learning, computer vision, natural language processing, and audio processing, while KAN leads to lower average root mean square error. For the Symbolic Formula Representation task, a lower RMSE is better.
Figure 2: Performance comparison between KAN and MLP on Machine Learning datasets controlling the number of parameters.
Figure 3: Performance comparison between KAN and MLP on Machine Learning datasets controlling FLOPs.
Figure 4: Performance comparison between KAN and MLP on Computer Vision datasets controlling the number of parameters.
Figure 5: Performance comparison between KAN and MLP on Computer Vision datasets controlling FLOPs.
...and 10 more figures

KAN or MLP: A Fairer Comparison

TL;DR

Abstract

KAN or MLP: A Fairer Comparison

Authors

TL;DR

Abstract

Table of Contents

Figures (15)