Table of Contents
Fetching ...

Convolutional Kolmogorov-Arnold Networks

Alexander Dylan Bodner, Antonio Santiago Tepsich, Jack Natan Spolski, Santiago Pourteau

TL;DR

The paper tackles parameter efficiency in vision models by introducing Convolutional Kolmogorov-Arnold Networks, which replace fixed convolutional weights with learnable spline-based activations (B-splines) within convolutional kernels. The authors design KAN Convolutions, discuss grid extension to keep splines well-behaved, and analyze the parameter implications, demonstrating competitive Fashion-MNIST performance with substantially fewer parameters in several configurations. They provide a thorough experimental comparison across architectures and highlight both potential gains in expressivity and practical challenges, such as slower training times due to non-GPU-parallelizable spline computations. The work establishes Convolutional KANs as a promising, parameter-efficient alternative to standard CNNs and outlines clear directions for improving scalability and interpretability in future research.

Abstract

In this paper, we present Convolutional Kolmogorov-Arnold Networks, a novel architecture that integrates the learnable spline-based activation functions of Kolmogorov-Arnold Networks (KANs) into convolutional layers. By replacing traditional fixed-weight kernels with learnable non-linear functions, Convolutional KANs offer a significant improvement in parameter efficiency and expressive power over standard Convolutional Neural Networks (CNNs). We empirically evaluate Convolutional KANs on the Fashion-MNIST dataset, demonstrating competitive accuracy with up to 50% fewer parameters compared to baseline classic convolutions. This suggests that the KAN Convolution can effectively capture complex spatial relationships with fewer resources, offering a promising alternative for parameter-efficient deep learning models.

Convolutional Kolmogorov-Arnold Networks

TL;DR

The paper tackles parameter efficiency in vision models by introducing Convolutional Kolmogorov-Arnold Networks, which replace fixed convolutional weights with learnable spline-based activations (B-splines) within convolutional kernels. The authors design KAN Convolutions, discuss grid extension to keep splines well-behaved, and analyze the parameter implications, demonstrating competitive Fashion-MNIST performance with substantially fewer parameters in several configurations. They provide a thorough experimental comparison across architectures and highlight both potential gains in expressivity and practical challenges, such as slower training times due to non-GPU-parallelizable spline computations. The work establishes Convolutional KANs as a promising, parameter-efficient alternative to standard CNNs and outlines clear directions for improving scalability and interpretability in future research.

Abstract

In this paper, we present Convolutional Kolmogorov-Arnold Networks, a novel architecture that integrates the learnable spline-based activation functions of Kolmogorov-Arnold Networks (KANs) into convolutional layers. By replacing traditional fixed-weight kernels with learnable non-linear functions, Convolutional KANs offer a significant improvement in parameter efficiency and expressive power over standard Convolutional Neural Networks (CNNs). We empirically evaluate Convolutional KANs on the Fashion-MNIST dataset, demonstrating competitive accuracy with up to 50% fewer parameters compared to baseline classic convolutions. This suggests that the KAN Convolution can effectively capture complex spatial relationships with fewer resources, offering a promising alternative for parameter-efficient deep learning models.
Paper Structure (19 sections, 16 equations, 4 figures, 2 tables)

This paper contains 19 sections, 16 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Splines learned by the first convolution at the first position for different ranges. The left plot shows the spline learned within the range $[-1, 1]$, while the right plot shows the spline learned within the range $[-10, 10]$. The SILU (Sigmoid Linear Unit) function is added to the spline across the entire range, but the spline is only defined within $[-1, 1]$. Thus, outside this range, the SILU function predominates.
  • Figure 2: KAN Architectures used in experiments. The Max Pooling layers are done after every Convolutional Layer, but for simplicity sake of the scheme we decided to show it only at the end. Every architecture has at the end a Log Softmax layer.
  • Figure 3: Standard Architectures used in experiments. The Max Pooling layers are done after every Convolutional Layer, but for simplicity sake of the scheme we decided to show it only at the end. Every architecture has at the end a Log Softmax layer.
  • Figure 4: Parameter count vs Accuracy in Fashion-MNIST dataset.