Table of Contents
Fetching ...

Residual Kolmogorov-Arnold Network for Enhanced Deep Learning

Ray Congrui Yu, Sherry Wu, Jiang Gui

TL;DR

This work introduces RKAN, a lightweight plug-in module that augments traditional CNN stages with polynomial feature transformations via learnable KAN activations, guided by the Kolmogorov-Arnold representation theorem. By implementing a cross-stage residual path that runs in parallel to the backbone, RKAN enhances gradient flow and provides a regularizing, expressive alternate path for feature transformation, without altering the core architecture. Across Tiny ImageNet, CIFAR-100, Food-101, ImageNet, and COCO, RKAN yields consistent accuracy gains, faster convergence, and improved training stability, with Chebyshev polynomials often outperforming Gaussian RBFs. The approach is especially beneficial on small datasets, delivering notable improvements with modest computational overhead and broad compatibility with modern backbones and downstream tasks such as object detection and segmentation.

Abstract

Despite their immense success, deep convolutional neural networks (CNNs) can be difficult to optimize and costly to train due to hundreds of layers within the network depth. Conventional convolutional operations are fundamentally limited by their linear nature along with fixed activations, where many layers are needed to learn meaningful patterns in data. Because of the sheer size of these networks, this approach is simply computationally inefficient, and poses overfitting or gradient explosion risks, especially in small datasets. As a result, we introduce a "plug-in" module, called Residual Kolmogorov-Arnold Network (RKAN). Our module is highly compact, so it can be easily added into any stage (level) of traditional deep networks, where it learns to integrate supportive polynomial feature transformations to existing convolutional frameworks. RKAN offers consistent improvements over baseline models in different vision tasks and widely tested benchmarks, accomplishing cutting-edge performance on them.

Residual Kolmogorov-Arnold Network for Enhanced Deep Learning

TL;DR

This work introduces RKAN, a lightweight plug-in module that augments traditional CNN stages with polynomial feature transformations via learnable KAN activations, guided by the Kolmogorov-Arnold representation theorem. By implementing a cross-stage residual path that runs in parallel to the backbone, RKAN enhances gradient flow and provides a regularizing, expressive alternate path for feature transformation, without altering the core architecture. Across Tiny ImageNet, CIFAR-100, Food-101, ImageNet, and COCO, RKAN yields consistent accuracy gains, faster convergence, and improved training stability, with Chebyshev polynomials often outperforming Gaussian RBFs. The approach is especially beneficial on small datasets, delivering notable improvements with modest computational overhead and broad compatibility with modern backbones and downstream tasks such as object detection and segmentation.

Abstract

Despite their immense success, deep convolutional neural networks (CNNs) can be difficult to optimize and costly to train due to hundreds of layers within the network depth. Conventional convolutional operations are fundamentally limited by their linear nature along with fixed activations, where many layers are needed to learn meaningful patterns in data. Because of the sheer size of these networks, this approach is simply computationally inefficient, and poses overfitting or gradient explosion risks, especially in small datasets. As a result, we introduce a "plug-in" module, called Residual Kolmogorov-Arnold Network (RKAN). Our module is highly compact, so it can be easily added into any stage (level) of traditional deep networks, where it learns to integrate supportive polynomial feature transformations to existing convolutional frameworks. RKAN offers consistent improvements over baseline models in different vision tasks and widely tested benchmarks, accomplishing cutting-edge performance on them.
Paper Structure (18 sections, 15 equations, 7 figures, 10 tables, 1 algorithm)

This paper contains 18 sections, 15 equations, 7 figures, 10 tables, 1 algorithm.

Figures (7)

  • Figure 1: RKAN-ResNet-50 (RKANet-50), reduce factor = 2 (L), RKANet-50-4$\times$L, inverse bottleneck expansion multiplier = 4 (R).
  • Figure 2: Comparison of RKAN-augmented and baseline model variants in Top-1 accuracy in terms of GigaFLOPs (L), throughput (Mid), and accuracy gain (R), which is calculated as the difference in accuracy between the RKAN-baseline pair.
  • Figure 3: Effect of reduce factor on Top-1 accuracy (L), and throughput (Mid) for RKAN-augmented models on the Tiny ImageNet validation. The $\mathbf{x}$ marker indicates the performance of the baseline models. Validation accuracy curves for the training duration (R).
  • Figure 4: FLOPs and throughput compared between RBF-based and Chebyshev polynomial-based RKAN-augmented models on the ILSVRC-2012 ImageNet dataset deng2009imagenet of resolution 224$\times$224.
  • Figure 5: KAN-based convolutional layer implemented using Chebyshev polynomials, applied depthwise (independently to each channel).
  • ...and 2 more figures