A preliminary study on continual learning in computer vision using Kolmogorov-Arnold Networks
Alessandro Cacciatore, Valerio Morelli, Federica Paganica, Emanuele Frontoni, Lucia Migliorelli, Daniele Berardini
TL;DR
The paper investigates continual learning in computer vision using Kolmogorov-Arnold Networks (KANs), comparing MLPs with two KAN-based models (EfficientKAN and PyKAN) under a class-incremental MNIST setting with equal trainable parameters. It shows that EfficientKAN outperforms both MLP and the original PyKAN in final accuracy on MNIST CL tasks (52% vs 40% vs 28%), while analyzing the influence of hyper-parameters and trainable components such as bias and scale factors. The study highlights KAN locality via spline-based activations as a potential advantage for continual learning in simple domains, but notes challenges in higher-dimensional CV tasks and substantial compute demands for PyKAN, with EfficientKAN offering a more scalable alternative. Overall, the results suggest KANs are a viable, though still exploratory, direction for CL in CV, motivating further work on KAN-based CNNs, hyper-parameter sensitivity, and practical pruning/symbolic interpretation in continual settings.
Abstract
Deep learning has long been dominated by multi-layer perceptrons (MLPs), which have demonstrated superiority over other optimizable models in various domains. Recently, a new alternative to MLPs has emerged - Kolmogorov-Arnold Networks (KAN)- which are based on a fundamentally different mathematical framework. According to their authors, KANs address several major issues in MLPs, such as catastrophic forgetting in continual learning scenarios. However, this claim has only been supported by results from a regression task on a toy 1D dataset. In this paper, we extend the investigation by evaluating the performance of KANs in continual learning tasks within computer vision, specifically using the MNIST datasets. To this end, we conduct a structured analysis of the behavior of MLPs and two KAN-based models in a class-incremental learning scenario, ensuring that the architectures involved have the same number of trainable parameters. Our results demonstrate that an efficient version of KAN outperforms both traditional MLPs and the original KAN implementation. We further analyze the influence of hyperparameters in MLPs and KANs, as well as the impact of certain trainable parameters in KANs, such as bias and scale weights. Additionally, we provide a preliminary investigation of recent KAN-based convolutional networks and compare their performance with that of traditional convolutional neural networks. Our codes can be found at https://github.com/MrPio/KAN-Continual_Learning_tests.
