Suitability of KANs for Computer Vision: A preliminary investigation
Basim Azam, Naveed Akhtar
TL;DR
<3-5 sentence high-level summary> This study evaluates Kolmogorov-Arnold Networks (KANs) for computer vision by implementing learnable edge-wise functions in several architectures, including ConvKAN, WavKAN, and UKAN-UNet. It benchmarks these models against standard CNNs and MLPs on MNIST, CIFAR-10, and CamVid to assess accuracy, training efficiency, and parameter usage. Results show that KAN-based models can achieve competitive accuracy and offer flexible representations, though performance on more complex data often requires richer edge functions and careful tuning. The work provides empirical benchmarks and suggests practical considerations for scaling KANs in vision tasks, pointing to future work on optimization, generalization, and efficiency.
Abstract
Kolmogorov-Arnold Networks (KANs) introduce a paradigm of neural modeling that implements learnable functions on the edges of the networks, diverging from the traditional node-centric activations in neural networks. This work assesses the applicability and efficacy of KANs in visual modeling, focusing on fundamental recognition and segmentation tasks. We mainly analyze the performance and efficiency of different network architectures built using KAN concepts along with conventional building blocks of convolutional and linear layers, enabling a comparative analysis with the conventional models. Our findings are aimed at contributing to understanding the potential of KANs in computer vision, highlighting both their strengths and areas for further research. Our evaluation point toward the fact that while KAN-based architectures perform in line with the original claims, it may often be important to employ more complex functions on the network edges to retain the performance advantage of KANs on more complex visual data.
