Table of Contents
Fetching ...

Suitability of KANs for Computer Vision: A preliminary investigation

Basim Azam, Naveed Akhtar

TL;DR

<3-5 sentence high-level summary> This study evaluates Kolmogorov-Arnold Networks (KANs) for computer vision by implementing learnable edge-wise functions in several architectures, including ConvKAN, WavKAN, and UKAN-UNet. It benchmarks these models against standard CNNs and MLPs on MNIST, CIFAR-10, and CamVid to assess accuracy, training efficiency, and parameter usage. Results show that KAN-based models can achieve competitive accuracy and offer flexible representations, though performance on more complex data often requires richer edge functions and careful tuning. The work provides empirical benchmarks and suggests practical considerations for scaling KANs in vision tasks, pointing to future work on optimization, generalization, and efficiency.

Abstract

Kolmogorov-Arnold Networks (KANs) introduce a paradigm of neural modeling that implements learnable functions on the edges of the networks, diverging from the traditional node-centric activations in neural networks. This work assesses the applicability and efficacy of KANs in visual modeling, focusing on fundamental recognition and segmentation tasks. We mainly analyze the performance and efficiency of different network architectures built using KAN concepts along with conventional building blocks of convolutional and linear layers, enabling a comparative analysis with the conventional models. Our findings are aimed at contributing to understanding the potential of KANs in computer vision, highlighting both their strengths and areas for further research. Our evaluation point toward the fact that while KAN-based architectures perform in line with the original claims, it may often be important to employ more complex functions on the network edges to retain the performance advantage of KANs on more complex visual data.

Suitability of KANs for Computer Vision: A preliminary investigation

TL;DR

<3-5 sentence high-level summary> This study evaluates Kolmogorov-Arnold Networks (KANs) for computer vision by implementing learnable edge-wise functions in several architectures, including ConvKAN, WavKAN, and UKAN-UNet. It benchmarks these models against standard CNNs and MLPs on MNIST, CIFAR-10, and CamVid to assess accuracy, training efficiency, and parameter usage. Results show that KAN-based models can achieve competitive accuracy and offer flexible representations, though performance on more complex data often requires richer edge functions and careful tuning. The work provides empirical benchmarks and suggests practical considerations for scaling KANs in vision tasks, pointing to future work on optimization, generalization, and efficiency.

Abstract

Kolmogorov-Arnold Networks (KANs) introduce a paradigm of neural modeling that implements learnable functions on the edges of the networks, diverging from the traditional node-centric activations in neural networks. This work assesses the applicability and efficacy of KANs in visual modeling, focusing on fundamental recognition and segmentation tasks. We mainly analyze the performance and efficiency of different network architectures built using KAN concepts along with conventional building blocks of convolutional and linear layers, enabling a comparative analysis with the conventional models. Our findings are aimed at contributing to understanding the potential of KANs in computer vision, highlighting both their strengths and areas for further research. Our evaluation point toward the fact that while KAN-based architectures perform in line with the original claims, it may often be important to employ more complex functions on the network edges to retain the performance advantage of KANs on more complex visual data.
Paper Structure (30 sections, 18 equations, 5 figures, 8 tables)

This paper contains 30 sections, 18 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Categorization of the types of network architectures used in this work. We employ KAN-based building blocks with conventional layers to construct different types of networks. The same naming conventions are used throughout this work.
  • Figure 2: A high-level comparison of basic network configurations using Multi-Layer Perceptrons (MLP), Kolmogorov-Arnold Networks (KAN), and Wavelet KAN. KAN-based models use learnable functions on edges instead of applying fixed activation functions on nodes/neurons. Traditional KAN and WavKAN mainly differ in the types of functions used. Number of nodes in network layers are mentioned at the bottom.
  • Figure 3: (a) Architectural overview of a KConvKAN used in our experiments to classify on MNIST dataset. (b) Visualization of feature maps and spline weights for the corresponding layer in (a).
  • Figure 4: (A) Overall UKAN Architecture: The UKAN architecture comprises an encoder (Down Blocks), a bottleneck, and a decoder (Up Blocks) pathway. Each block in the encoder and decoder contains UKAN convolutional layers, which are connected by skip connections (dashed lines) to corresponding layers across the encoder and decoder. The input image is progressively downsampled and then upsampled to produce the segmentation output. (B) UKANConv Block: This block, used within the Down and Up blocks, performs two consecutive KConv KAN operations, each followed by Batch Normalization and ReLU activation, which together transform the input feature maps. (C) Conceptual KConvKAN: The KConv KAN operation involves applying learnable spline functions (denoted by $\phi$) to the input features $x_1$ and $x_2$, producing output features $o_1$ and $o_2$.
  • Figure 5: Comparative performance visualization of different models on MNIST and CIFAR-10 datasets. The sizes and colors of the circles represent the scale of model parameters and performance metrics respectively, showcasing a range of outcomes from simple to complex models.