Table of Contents
Fetching ...

PAM-UNet: Shifting Attention on Region of Interest in Medical Images

Abhijit Das, Debesh Jha, Vandan Gorade, Koushik Biswas, Hongyi Pan, Zheyuan Zhang, Daniela P. Ladner, Yury Velichko, Amir Borhani, Ulas Bagci

TL;DR

PAM-UNet tackles the challenge of accurate medical image segmentation on resource-constrained devices by integrating depthwise separable convolutions with a novel Progressive Luong Attention (PLA) mechanism within a U-shaped, mobile backbone. PLA progressively aggregates long-range dependencies from shallow features into the decoder, guided by encoder residuals, and is reinforced by an Attention Regularization loss to avoid over-attention. Empirical results on LiTS-2017 and Kvasir-SEG show PAM-UNet achieving state-competitive Dice and mIoU scores while maintaining a very low computational burden (around $1.32$ FLOPS), outperforming several backbone- and attention-based baselines. The work also provides mechanistic insights via CKA analyses and ablations that underscore PLA as the primary driver of performance gains, demonstrating its suitability for efficient, edge-capable medical image segmentation with potential for future 3D and Transformer-integrated extensions.

Abstract

Computer-aided segmentation methods can assist medical personnel in improving diagnostic outcomes. While recent advancements like UNet and its variants have shown promise, they face a critical challenge: balancing accuracy with computational efficiency. Shallow encoder architectures in UNets often struggle to capture crucial spatial features, leading in inaccurate and sparse segmentation. To address this limitation, we propose a novel \underline{P}rogressive \underline{A}ttention based \underline{M}obile \underline{UNet} (\underline{PAM-UNet}) architecture. The inverted residual (IR) blocks in PAM-UNet help maintain a lightweight framework, while layerwise \textit{Progressive Luong Attention} ($\mathcal{PLA}$) promotes precise segmentation by directing attention toward regions of interest during synthesis. Our approach prioritizes both accuracy and speed, achieving a commendable balance with a mean IoU of 74.65 and a dice score of 82.87, while requiring only 1.32 floating-point operations per second (FLOPS) on the Liver Tumor Segmentation Benchmark (LiTS) 2017 dataset. These results highlight the importance of developing efficient segmentation models to accelerate the adoption of AI in clinical practice.

PAM-UNet: Shifting Attention on Region of Interest in Medical Images

TL;DR

PAM-UNet tackles the challenge of accurate medical image segmentation on resource-constrained devices by integrating depthwise separable convolutions with a novel Progressive Luong Attention (PLA) mechanism within a U-shaped, mobile backbone. PLA progressively aggregates long-range dependencies from shallow features into the decoder, guided by encoder residuals, and is reinforced by an Attention Regularization loss to avoid over-attention. Empirical results on LiTS-2017 and Kvasir-SEG show PAM-UNet achieving state-competitive Dice and mIoU scores while maintaining a very low computational burden (around FLOPS), outperforming several backbone- and attention-based baselines. The work also provides mechanistic insights via CKA analyses and ablations that underscore PLA as the primary driver of performance gains, demonstrating its suitability for efficient, edge-capable medical image segmentation with potential for future 3D and Transformer-integrated extensions.

Abstract

Computer-aided segmentation methods can assist medical personnel in improving diagnostic outcomes. While recent advancements like UNet and its variants have shown promise, they face a critical challenge: balancing accuracy with computational efficiency. Shallow encoder architectures in UNets often struggle to capture crucial spatial features, leading in inaccurate and sparse segmentation. To address this limitation, we propose a novel \underline{P}rogressive \underline{A}ttention based \underline{M}obile \underline{UNet} (\underline{PAM-UNet}) architecture. The inverted residual (IR) blocks in PAM-UNet help maintain a lightweight framework, while layerwise \textit{Progressive Luong Attention} () promotes precise segmentation by directing attention toward regions of interest during synthesis. Our approach prioritizes both accuracy and speed, achieving a commendable balance with a mean IoU of 74.65 and a dice score of 82.87, while requiring only 1.32 floating-point operations per second (FLOPS) on the Liver Tumor Segmentation Benchmark (LiTS) 2017 dataset. These results highlight the importance of developing efficient segmentation models to accelerate the adoption of AI in clinical practice.
Paper Structure (14 sections, 8 equations, 4 figures, 2 tables)

This paper contains 14 sections, 8 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: CKA plot illustrating that PAM-UNet learns from the very beginning. (a) PAM-UNet have highly similar structure throughout the model, (b) and (d) A large number of lower layers in the UNet and MobileUNet are similar to smaller set of the lower PAM-UNet layers and (c) PAM-Net takes a few layers lesser than AttUNet to learn the similar representations during training.
  • Figure 2: Here we present (a) Holistic architecture of proposed PAM-UNet, (b) Diagram of working of one unit of attention gate ($\mathcal{PLA}$) and (c) Features extracted from intermediate layers of PAM-UNet showing the effectiveness of proposed $\mathcal{PLA}$. Predicted mask before and after applying $\mathcal{PLA}$ is significantly different and proposed method produces precise segmentation.
  • Figure 3: Segmentation masks generated by PAM-UNet (Ours) and all the baselines highlight the capability justifying quantitative results. PAM-UNet segments both liver and tumour precisely detecting boundaries better than MobileUNet and AttUNet. DeepLabv3+, FCN8 and ResUNet due to overfitting takes peripheral regions in segmentation mask also while segmenting liver.
  • Figure 4: Corner case showing where PAM-UNet failed to detect all 10 polyps, which is a rare case in the Kvasir-SEG dataset (No training image contains more than 5 polyps).