UKAN-EP: Enhancing U-KAN with Efficient Attention and Pyramid Aggregation for 3D Multi-Modal MRI Brain Tumor Segmentation
Yanbing Chen, Tianze Tang, Taehyo Kim, Hai Shu
TL;DR
This work tackles 3D brain tumor segmentation from multi-modal MRI by introducing UKAN-EP, a 3D extension of U-KAN that fuses KAN-based bottlenecks with Efficient Channel Attention and Pyramid Feature Aggregation, guided by a dynamic cross-entropy/Dice loss. The method achieves state-of-the-art accuracy on BraTS-GLI with a Dice of $0.9001$ for the whole tumor and an IoU of $0.8257$, while maintaining a compact footprint of $223.57$ GFLOPs and $11.30$M parameters, substantially outperforming several baselines in efficiency. Ablation studies confirm the pivotal roles of ECA and PFA, show limited benefits from self-attention or ViT integration, and demonstrate the superiority of the dynamic loss over fixed-weight alternatives. Overall, UKAN-EP offers a favorable accuracy-efficiency trade-off for robust 3D multi-modal MRI brain tumor segmentation, with potential for clinical deployment and extension to broader datasets and modalities.
Abstract
Background: Gliomas are among the most common malignant brain tumors and exhibit substantial heterogeneity, complicating accurate detection and segmentation. Although multi-modal MRI is the clinical standard for glioma imaging, variability across modalities and high computational demands hamper effective automated segmentation. Methods: We propose UKAN-EP, a novel 3D extension of the original 2D U-KAN model for multi-modal MRI brain tumor segmentation. While U-KAN integrates Kolmogorov-Arnold Network (KAN) layers into a U-Net backbone, UKAN-EP further incorporates Efficient Channel Attention (ECA) and Pyramid Feature Aggregation (PFA) modules to enhance inter-modality feature fusion and multi-scale feature representation. We also introduce a dynamic loss weighting strategy that adaptively balances cross-entropy and Dice losses during training. Results: On the 2024 BraTS-GLI dataset, UKAN-EP achieves superior segmentation performance (e.g., Dice = 0.9001 $\pm$ 0.0127 and IoU = 0.8257 $\pm$ 0.0186 for the whole tumor) while requiring substantially fewer computational resources (223.57 GFLOPs and 11.30M parameters) compared to strong baselines including U-Net, Attention U-Net, Swin UNETR, VT-Unet, TransBTS, and 3D U-KAN. An extensive ablation study further confirms the effectiveness of ECA and PFA and shows the limited utility of self-attention and spatial attention alternatives. Conclusion: UKAN-EP demonstrates that combining the expressive power of KAN layers with lightweight channel-wise attention and multi-scale feature aggregation improves the accuracy and efficiency of brain tumor segmentation.
