A preliminary study on continual learning in computer vision using Kolmogorov-Arnold Networks

Alessandro Cacciatore; Valerio Morelli; Federica Paganica; Emanuele Frontoni; Lucia Migliorelli; Daniele Berardini

A preliminary study on continual learning in computer vision using Kolmogorov-Arnold Networks

Alessandro Cacciatore, Valerio Morelli, Federica Paganica, Emanuele Frontoni, Lucia Migliorelli, Daniele Berardini

TL;DR

The paper investigates continual learning in computer vision using Kolmogorov-Arnold Networks (KANs), comparing MLPs with two KAN-based models (EfficientKAN and PyKAN) under a class-incremental MNIST setting with equal trainable parameters. It shows that EfficientKAN outperforms both MLP and the original PyKAN in final accuracy on MNIST CL tasks (52% vs 40% vs 28%), while analyzing the influence of hyper-parameters and trainable components such as bias and scale factors. The study highlights KAN locality via spline-based activations as a potential advantage for continual learning in simple domains, but notes challenges in higher-dimensional CV tasks and substantial compute demands for PyKAN, with EfficientKAN offering a more scalable alternative. Overall, the results suggest KANs are a viable, though still exploratory, direction for CL in CV, motivating further work on KAN-based CNNs, hyper-parameter sensitivity, and practical pruning/symbolic interpretation in continual settings.

Abstract

Deep learning has long been dominated by multi-layer perceptrons (MLPs), which have demonstrated superiority over other optimizable models in various domains. Recently, a new alternative to MLPs has emerged - Kolmogorov-Arnold Networks (KAN)- which are based on a fundamentally different mathematical framework. According to their authors, KANs address several major issues in MLPs, such as catastrophic forgetting in continual learning scenarios. However, this claim has only been supported by results from a regression task on a toy 1D dataset. In this paper, we extend the investigation by evaluating the performance of KANs in continual learning tasks within computer vision, specifically using the MNIST datasets. To this end, we conduct a structured analysis of the behavior of MLPs and two KAN-based models in a class-incremental learning scenario, ensuring that the architectures involved have the same number of trainable parameters. Our results demonstrate that an efficient version of KAN outperforms both traditional MLPs and the original KAN implementation. We further analyze the influence of hyperparameters in MLPs and KANs, as well as the impact of certain trainable parameters in KANs, such as bias and scale weights. Additionally, we provide a preliminary investigation of recent KAN-based convolutional networks and compare their performance with that of traditional convolutional neural networks. Our codes can be found at https://github.com/MrPio/KAN-Continual_Learning_tests.

A preliminary study on continual learning in computer vision using Kolmogorov-Arnold Networks

TL;DR

Abstract

Paper Structure (20 sections, 15 equations, 20 figures, 1 table)

This paper contains 20 sections, 15 equations, 20 figures, 1 table.

Introduction to Kolmogorov-Arnold Networks
The Kolmogorov-Arnold Theorem
Kolmogorov-Arnold Networks
Practical KAN implementation for continual learning
KAN-based neural networks: EfficientKAN
State of the art
State of the art on continual learning
Continual Learning scenarios
Continual Learning strategies
State of the art on KAN in computer vision
Architectures and hyper-parameters
Choosing the right architectures for fair comparison
Training protocol: Class incremental learning and hyper-parameters
Results
Discussions
...and 5 more sections

Figures (20)

Figure 1: A simple two-layer KAN, where the activation functions are arranged on the edges and the nodes compute the sum. The number of outer functions does not comply with the KAT statement, since they are less than $2n+1=9$.
Figure 2: Toy example of KAN's ability in CL scenarios. The model is trained on a 1D regression dataset, by feeding points from each peaks sequentially. A [1,1] KAN with grid size set to 200 and spline order to 3 can perfectly fit data points, and new data points do not seem to have any influence on previously learnt knowledge.
Figure 3: Test accuracy plot for MLP.
Figure 4: Confusion matrix for MLP.
Figure 5: Test accuracy plot for PyKAN.
...and 15 more figures

A preliminary study on continual learning in computer vision using Kolmogorov-Arnold Networks

TL;DR

Abstract

A preliminary study on continual learning in computer vision using Kolmogorov-Arnold Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (20)