When Swin Transformer Meets KANs: An Improved Transformer Architecture for Medical Image Segmentation
Nishchal Sapkota, Haoyan Shi, Yejia Zhang, Xianshi Ma, Bofang Zheng, Danny Z. Chen
TL;DR
The paper tackles data efficiency in medical image segmentation by enhancing Transformer-based encoders with Kolmogorov–Arnold Networks (KANs). It introduces UKAST, a U‑Net–like architecture that embeds Group Rational KANs (GR‑KANs) as learnable, rational-function–based feed-forward layers within a Swin Transformer encoder, complemented by a CNN decoder. Empirical results show state-of-the-art performance across four 2D/3D benchmarks and strong data-efficiency in data-scarce settings, with ablations confirming the benefits of GR‑KANs and SwinT with residual convolutions. The work demonstrates that KAN-enhanced Transformers can achieve higher expressiveness and data efficiency without substantially increasing computation, advancing practical medical image segmentation.
Abstract
Medical image segmentation is critical for accurate diagnostics and treatment planning, but remains challenging due to complex anatomical structures and limited annotated training data. CNN-based segmentation methods excel at local feature extraction, but struggle with modeling long-range dependencies. Transformers, on the other hand, capture global context more effectively, but are inherently data-hungry and computationally expensive. In this work, we introduce UKAST, a U-Net like architecture that integrates rational-function based Kolmogorov-Arnold Networks (KANs) into Swin Transformer encoders. By leveraging rational base functions and Group Rational KANs (GR-KANs) from the Kolmogorov-Arnold Transformer (KAT), our architecture addresses the inefficiencies of vanilla spline-based KANs, yielding a more expressive and data-efficient framework with reduced FLOPs and only a very small increase in parameter count compared to SwinUNETR. UKAST achieves state-of-the-art performance on four diverse 2D and 3D medical image segmentation benchmarks, consistently surpassing both CNN- and Transformer-based baselines. Notably, it attains superior accuracy in data-scarce settings, alleviating the data-hungry limitations of standard Vision Transformers. These results show the potential of KAN-enhanced Transformers to advance data-efficient medical image segmentation. Code is available at: https://github.com/nsapkota417/UKAST
