Table of Contents
Fetching ...

UKAN-EP: Enhancing U-KAN with Efficient Attention and Pyramid Aggregation for 3D Multi-Modal MRI Brain Tumor Segmentation

Yanbing Chen, Tianze Tang, Taehyo Kim, Hai Shu

TL;DR

This work tackles 3D brain tumor segmentation from multi-modal MRI by introducing UKAN-EP, a 3D extension of U-KAN that fuses KAN-based bottlenecks with Efficient Channel Attention and Pyramid Feature Aggregation, guided by a dynamic cross-entropy/Dice loss. The method achieves state-of-the-art accuracy on BraTS-GLI with a Dice of $0.9001$ for the whole tumor and an IoU of $0.8257$, while maintaining a compact footprint of $223.57$ GFLOPs and $11.30$M parameters, substantially outperforming several baselines in efficiency. Ablation studies confirm the pivotal roles of ECA and PFA, show limited benefits from self-attention or ViT integration, and demonstrate the superiority of the dynamic loss over fixed-weight alternatives. Overall, UKAN-EP offers a favorable accuracy-efficiency trade-off for robust 3D multi-modal MRI brain tumor segmentation, with potential for clinical deployment and extension to broader datasets and modalities.

Abstract

Background: Gliomas are among the most common malignant brain tumors and exhibit substantial heterogeneity, complicating accurate detection and segmentation. Although multi-modal MRI is the clinical standard for glioma imaging, variability across modalities and high computational demands hamper effective automated segmentation. Methods: We propose UKAN-EP, a novel 3D extension of the original 2D U-KAN model for multi-modal MRI brain tumor segmentation. While U-KAN integrates Kolmogorov-Arnold Network (KAN) layers into a U-Net backbone, UKAN-EP further incorporates Efficient Channel Attention (ECA) and Pyramid Feature Aggregation (PFA) modules to enhance inter-modality feature fusion and multi-scale feature representation. We also introduce a dynamic loss weighting strategy that adaptively balances cross-entropy and Dice losses during training. Results: On the 2024 BraTS-GLI dataset, UKAN-EP achieves superior segmentation performance (e.g., Dice = 0.9001 $\pm$ 0.0127 and IoU = 0.8257 $\pm$ 0.0186 for the whole tumor) while requiring substantially fewer computational resources (223.57 GFLOPs and 11.30M parameters) compared to strong baselines including U-Net, Attention U-Net, Swin UNETR, VT-Unet, TransBTS, and 3D U-KAN. An extensive ablation study further confirms the effectiveness of ECA and PFA and shows the limited utility of self-attention and spatial attention alternatives. Conclusion: UKAN-EP demonstrates that combining the expressive power of KAN layers with lightweight channel-wise attention and multi-scale feature aggregation improves the accuracy and efficiency of brain tumor segmentation.

UKAN-EP: Enhancing U-KAN with Efficient Attention and Pyramid Aggregation for 3D Multi-Modal MRI Brain Tumor Segmentation

TL;DR

This work tackles 3D brain tumor segmentation from multi-modal MRI by introducing UKAN-EP, a 3D extension of U-KAN that fuses KAN-based bottlenecks with Efficient Channel Attention and Pyramid Feature Aggregation, guided by a dynamic cross-entropy/Dice loss. The method achieves state-of-the-art accuracy on BraTS-GLI with a Dice of for the whole tumor and an IoU of , while maintaining a compact footprint of GFLOPs and M parameters, substantially outperforming several baselines in efficiency. Ablation studies confirm the pivotal roles of ECA and PFA, show limited benefits from self-attention or ViT integration, and demonstrate the superiority of the dynamic loss over fixed-weight alternatives. Overall, UKAN-EP offers a favorable accuracy-efficiency trade-off for robust 3D multi-modal MRI brain tumor segmentation, with potential for clinical deployment and extension to broader datasets and modalities.

Abstract

Background: Gliomas are among the most common malignant brain tumors and exhibit substantial heterogeneity, complicating accurate detection and segmentation. Although multi-modal MRI is the clinical standard for glioma imaging, variability across modalities and high computational demands hamper effective automated segmentation. Methods: We propose UKAN-EP, a novel 3D extension of the original 2D U-KAN model for multi-modal MRI brain tumor segmentation. While U-KAN integrates Kolmogorov-Arnold Network (KAN) layers into a U-Net backbone, UKAN-EP further incorporates Efficient Channel Attention (ECA) and Pyramid Feature Aggregation (PFA) modules to enhance inter-modality feature fusion and multi-scale feature representation. We also introduce a dynamic loss weighting strategy that adaptively balances cross-entropy and Dice losses during training. Results: On the 2024 BraTS-GLI dataset, UKAN-EP achieves superior segmentation performance (e.g., Dice = 0.9001 0.0127 and IoU = 0.8257 0.0186 for the whole tumor) while requiring substantially fewer computational resources (223.57 GFLOPs and 11.30M parameters) compared to strong baselines including U-Net, Attention U-Net, Swin UNETR, VT-Unet, TransBTS, and 3D U-KAN. An extensive ablation study further confirms the effectiveness of ECA and PFA and shows the limited utility of self-attention and spatial attention alternatives. Conclusion: UKAN-EP demonstrates that combining the expressive power of KAN layers with lightweight channel-wise attention and multi-scale feature aggregation improves the accuracy and efficiency of brain tumor segmentation.
Paper Structure (25 sections, 11 equations, 4 figures, 1 table)

This paper contains 25 sections, 11 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Architecture of UKAN-EP. The model combines Tokenized KAN blocks at the bottleneck with Pyramid Feature Aggregation (PFA) and Efficient Channel Attention (ECA) modules to achieve enhanced integration and refinement of multi-modal features.
  • Figure 2: Example slices from the four MRI modalities and the ground-truth segmentation. For the truth labels, red is NETC, green is SNFH, blue is ET, and yellow is RC.
  • Figure 3: Example segmentation results for four BraTS-GLI test cases from the 2024 BraTS Challenge dataset (first two columns) and the post-Challenge dataset (last two columns). Red is NETC, green is SNFH, blue is ET, and yellow is RC.
  • Figure 4: Comparison of overall soft Dice scores (i.e., 1 $-$ Dice loss; see \ref{['individual dice loss']}) averaged separately over the training and validation sets during U-KAN training with different ViT configurations.