Table of Contents
Fetching ...

A4-Unet: Deformable Multi-Scale Attention Network for Brain Tumor Segmentation

Ruoxin Wang, Tianyi Tang, Haiming Du, Yuxuan Cheng, Yu Wang, Lingjie Yang, Xiaohui Duan, Yunfang Yu, Yu Zhou, Donglong Chen

TL;DR

Brain tumor segmentation from MRI remains challenging due to irregular tumor shapes and boundary ambiguity. The authors introduce A4-Unet, a multitier architecture combining Deformable Large Kernel Attention (DLKA) in the encoder, Swin Spatial Pyramid Pooling (SSPP) in the bottleneck, and a Convolutional Attention Module (CAM) with Attention Gates (AG) in the decoder, augmented by Orthogonal Channel Attention via Discrete Cosine Transform. Key contributions include a strong DLKA-enabled encoder for long-range context, SSPP for multi-scale fusion, and CAM/AG modules that refine feature fusion and edge delineation, resulting in state-of-the-art Dice and mIoU on BraTS2019–2021 and a proprietary dataset. The work demonstrates that integrating deformable, multi-scale attention with transformer-inspired context yields robust brain-tumor segmentation with favorable computational efficiency, and provides public code to facilitate adoption in clinical research.

Abstract

Brain tumor segmentation models have aided diagnosis in recent years. However, they face MRI complexity and variability challenges, including irregular shapes and unclear boundaries, leading to noise, misclassification, and incomplete segmentation, thereby limiting accuracy. To address these issues, we adhere to an outstanding Convolutional Neural Networks (CNNs) design paradigm and propose a novel network named A4-Unet. In A4-Unet, Deformable Large Kernel Attention (DLKA) is incorporated in the encoder, allowing for improved capture of multi-scale tumors. Swin Spatial Pyramid Pooling (SSPP) with cross-channel attention is employed in a bottleneck further to study long-distance dependencies within images and channel relationships. To enhance accuracy, a Combined Attention Module (CAM) with Discrete Cosine Transform (DCT) orthogonality for channel weighting and convolutional element-wise multiplication is introduced for spatial weighting in the decoder. Attention gates (AG) are added in the skip connection to highlight the foreground while suppressing irrelevant background information. The proposed network is evaluated on three authoritative MRI brain tumor benchmarks and a proprietary dataset, and it achieves a 94.4% Dice score on the BraTS 2020 dataset, thereby establishing multiple new state-of-the-art benchmarks. The code is available here: https://github.com/WendyWAAAAANG/A4-Unet.

A4-Unet: Deformable Multi-Scale Attention Network for Brain Tumor Segmentation

TL;DR

Brain tumor segmentation from MRI remains challenging due to irregular tumor shapes and boundary ambiguity. The authors introduce A4-Unet, a multitier architecture combining Deformable Large Kernel Attention (DLKA) in the encoder, Swin Spatial Pyramid Pooling (SSPP) in the bottleneck, and a Convolutional Attention Module (CAM) with Attention Gates (AG) in the decoder, augmented by Orthogonal Channel Attention via Discrete Cosine Transform. Key contributions include a strong DLKA-enabled encoder for long-range context, SSPP for multi-scale fusion, and CAM/AG modules that refine feature fusion and edge delineation, resulting in state-of-the-art Dice and mIoU on BraTS2019–2021 and a proprietary dataset. The work demonstrates that integrating deformable, multi-scale attention with transformer-inspired context yields robust brain-tumor segmentation with favorable computational efficiency, and provides public code to facilitate adoption in clinical research.

Abstract

Brain tumor segmentation models have aided diagnosis in recent years. However, they face MRI complexity and variability challenges, including irregular shapes and unclear boundaries, leading to noise, misclassification, and incomplete segmentation, thereby limiting accuracy. To address these issues, we adhere to an outstanding Convolutional Neural Networks (CNNs) design paradigm and propose a novel network named A4-Unet. In A4-Unet, Deformable Large Kernel Attention (DLKA) is incorporated in the encoder, allowing for improved capture of multi-scale tumors. Swin Spatial Pyramid Pooling (SSPP) with cross-channel attention is employed in a bottleneck further to study long-distance dependencies within images and channel relationships. To enhance accuracy, a Combined Attention Module (CAM) with Discrete Cosine Transform (DCT) orthogonality for channel weighting and convolutional element-wise multiplication is introduced for spatial weighting in the decoder. Attention gates (AG) are added in the skip connection to highlight the foreground while suppressing irrelevant background information. The proposed network is evaluated on three authoritative MRI brain tumor benchmarks and a proprietary dataset, and it achieves a 94.4% Dice score on the BraTS 2020 dataset, thereby establishing multiple new state-of-the-art benchmarks. The code is available here: https://github.com/WendyWAAAAANG/A4-Unet.

Paper Structure

This paper contains 28 sections, 16 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Visualization of one sample of BraTS 2020 dataset. We can observe significant variability in the target's shape, size, and distribution on each slice for a tumor target. Meanwhile, multiple-segmented targets are also present.
  • Figure 2: The overall architecture of our proposed A4-Unet.
  • Figure 3: DLKA dynamically modifies convolutional weight coefficients and deformation offsets during training, enhancing the extraction of features from irregular objects in medical images.
  • Figure 4: The implementation of SSPP and Cross-Contextual Attention module. The Swin Transformer uses small windows for local features and larger ones for global semantics. In the cross-attention block, multi-scale channel information is fused using an MLP layer and GAP to calculate attention scores.
  • Figure 5: The general structure of Combined Attention Module. It consists of an orthogonal channel attention, a convolution-based spatial attention, and a $1 \times 1$ convolutional block.
  • ...and 4 more figures